TY - GEN
T1 - Joint Differential Optimization and Verification for Certified Reinforcement Learning
AU - Wang, Yixuan
AU - Zhan, Simon
AU - Wang, Zhilu
AU - Huang, Chao
AU - Wang, Zhaoran
AU - Yang, Zhuoran
AU - Zhu, Qi
N1 - Publisher Copyright:
© ICCPS 2023. All rights reserved.
PY - 2023/5/9
Y1 - 2023/5/9
N2 - Model-based reinforcement learning has been widely studied for controller synthesis in cyber-physical systems (CPSs). In particular, for safety-critical CPSs, it is important to formally certify system properties (e.g., safety, stability) under the learned RL controller. However, as existing methods typically conduct formal verification after the controller has been learned, it is often difficult to obtain any certificate, even after many iterations between learning and verification. To address this challenge, we propose a framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is end-to-end differentiable by the gradients from the value function and certificates formulated by linear programs and semi-definite programs. In experiments, our framework is compared with a baseline model-based stochastic value gradient (SVG) method and its extension to solve constrained Markov Decision Processes (CMDPs) for safety. The results demonstrate the significant advantages of our framework in finding feasible controllers with certificates, i.e., Barrier functions and Lyapunov functions that formally ensure system safety and stability, available on Github.
AB - Model-based reinforcement learning has been widely studied for controller synthesis in cyber-physical systems (CPSs). In particular, for safety-critical CPSs, it is important to formally certify system properties (e.g., safety, stability) under the learned RL controller. However, as existing methods typically conduct formal verification after the controller has been learned, it is often difficult to obtain any certificate, even after many iterations between learning and verification. To address this challenge, we propose a framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is end-to-end differentiable by the gradients from the value function and certificates formulated by linear programs and semi-definite programs. In experiments, our framework is compared with a baseline model-based stochastic value gradient (SVG) method and its extension to solve constrained Markov Decision Processes (CMDPs) for safety. The results demonstrate the significant advantages of our framework in finding feasible controllers with certificates, i.e., Barrier functions and Lyapunov functions that formally ensure system safety and stability, available on Github.
KW - Barrier function
KW - Lyapunov function
KW - RL
KW - Safety
KW - Stability
UR - http://www.scopus.com/inward/record.url?scp=85165935736&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85165935736&partnerID=8YFLogxK
U2 - 10.1145/3576841.3585919
DO - 10.1145/3576841.3585919
M3 - Conference contribution
AN - SCOPUS:85165935736
T3 - ICCPS 2023 - Proceedings of the 2023 ACM/IEEE 14th International Conference on Cyber-Physical Systems with CPS-IoT Week 2023
SP - 132
EP - 141
BT - ICCPS 2023 - Proceedings of the 2023 ACM/IEEE 14th International Conference on Cyber-Physical Systems with CPS-IoT Week 2023
PB - Association for Computing Machinery, Inc
T2 - 14th ACM/IEEE International Conference on Cyber-Physical Systems, with CPS-IoT Week 2023, ICCPS 2023
Y2 - 9 May 2023 through 12 May 2023
ER -