Joint Differential Optimization and Verification for Certified Reinforcement Learning

Yixuan Wang, Simon Zhan, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

Model-based reinforcement learning has been widely studied for controller synthesis in cyber-physical systems (CPSs). In particular, for safety-critical CPSs, it is important to formally certify system properties (e.g., safety, stability) under the learned RL controller. However, as existing methods typically conduct formal verification after the controller has been learned, it is often difficult to obtain any certificate, even after many iterations between learning and verification. To address this challenge, we propose a framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is end-to-end differentiable by the gradients from the value function and certificates formulated by linear programs and semi-definite programs. In experiments, our framework is compared with a baseline model-based stochastic value gradient (SVG) method and its extension to solve constrained Markov Decision Processes (CMDPs) for safety. The results demonstrate the significant advantages of our framework in finding feasible controllers with certificates, i.e., Barrier functions and Lyapunov functions that formally ensure system safety and stability, available on Github.

Original languageEnglish (US)
Title of host publicationICCPS 2023 - Proceedings of the 2023 ACM/IEEE 14th International Conference on Cyber-Physical Systems with CPS-IoT Week 2023
PublisherAssociation for Computing Machinery, Inc
Pages132-141
Number of pages10
ISBN (Electronic)9798400700361
DOIs
StatePublished - May 9 2023
Event14th ACM/IEEE International Conference on Cyber-Physical Systems, with CPS-IoT Week 2023, ICCPS 2023 - San Antonio, United States
Duration: May 9 2023May 12 2023

Publication series

NameICCPS 2023 - Proceedings of the 2023 ACM/IEEE 14th International Conference on Cyber-Physical Systems with CPS-IoT Week 2023

Conference

Conference14th ACM/IEEE International Conference on Cyber-Physical Systems, with CPS-IoT Week 2023, ICCPS 2023
Country/TerritoryUnited States
CitySan Antonio
Period5/9/235/12/23

Keywords

  • Barrier function
  • Lyapunov function
  • RL
  • Safety
  • Stability

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Joint Differential Optimization and Verification for Certified Reinforcement Learning'. Together they form a unique fingerprint.

Cite this