TY - JOUR
T1 - Provably global convergence of actor-critic
T2 - 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019
AU - Yang, Zhuoran
AU - Chen, Yongxin
AU - Hong, Mingyi
AU - Wang, Zhaoran
N1 - Publisher Copyright:
© 2019 Neural information processing systems foundation. All rights reserved.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019
Y1 - 2019
N2 - Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.
AB - Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.
UR - http://www.scopus.com/inward/record.url?scp=85090171471&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090171471&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85090171471
VL - 32
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
SN - 1049-5258
Y2 - 8 December 2019 through 14 December 2019
ER -