TY - GEN
T1 - AIRS
T2 - 32nd USENIX Security Symposium, USENIX Security 2023
AU - Yu, Jiahao
AU - Guo, Wenbo
AU - Qin, Qi
AU - Wang, Gang
AU - Wang, Ting
AU - Xing, Xinyu
N1 - Publisher Copyright:
© 32nd USENIX Security Symposium, USENIX Security 2023. All rights reserved
PY - 2023
Y1 - 2023
N2 - Recently, we have witnessed the success of deep reinforcement learning (DRL) in many security applications, ranging from malware mutation to selfish blockchain mining. Like all other machine learning methods, the lack of explainability has been limiting its broad adoption as users have difficulty establishing trust in DRL models’ decisions. Over the past years, different methods have been proposed to explain DRL models but unfortunately, they are often not suitable for security applications, in which explanation fidelity, efficiency, and the capability of model debugging are largely lacking. In this work, we propose AIRS, a general framework to explain deep reinforcement learning-based security applications. Unlike previous works that pinpoint important features to the agent’s current action, our explanation is at the step level. It models the relationship between the final reward and the key steps that a DRL agent takes, and thus outputs the steps that are most critical towards the final reward the agent has gathered. Using four representative security-critical applications, we evaluate AIRS from the perspectives of explainability, fidelity, stability, and efficiency. We show that AIRS could outperform alternative explainable DRL methods. We also showcase AIRS’s utility, demonstrating that our explanation could facilitate the DRL model’s failure offset, help users establish trust in a model decision, and even assist the identification of inappropriate reward designs.
AB - Recently, we have witnessed the success of deep reinforcement learning (DRL) in many security applications, ranging from malware mutation to selfish blockchain mining. Like all other machine learning methods, the lack of explainability has been limiting its broad adoption as users have difficulty establishing trust in DRL models’ decisions. Over the past years, different methods have been proposed to explain DRL models but unfortunately, they are often not suitable for security applications, in which explanation fidelity, efficiency, and the capability of model debugging are largely lacking. In this work, we propose AIRS, a general framework to explain deep reinforcement learning-based security applications. Unlike previous works that pinpoint important features to the agent’s current action, our explanation is at the step level. It models the relationship between the final reward and the key steps that a DRL agent takes, and thus outputs the steps that are most critical towards the final reward the agent has gathered. Using four representative security-critical applications, we evaluate AIRS from the perspectives of explainability, fidelity, stability, and efficiency. We show that AIRS could outperform alternative explainable DRL methods. We also showcase AIRS’s utility, demonstrating that our explanation could facilitate the DRL model’s failure offset, help users establish trust in a model decision, and even assist the identification of inappropriate reward designs.
UR - http://www.scopus.com/inward/record.url?scp=85176318991&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85176318991&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85176318991
T3 - 32nd USENIX Security Symposium, USENIX Security 2023
SP - 7375
EP - 7392
BT - 32nd USENIX Security Symposium, USENIX Security 2023
PB - USENIX Association
Y2 - 9 August 2023 through 11 August 2023
ER -