TY - JOUR
T1 - Provably efficient neural GTD algorithm for off-policy learning
AU - Wai, Hoi To
AU - Yang, Zhuoran
AU - Wang, Zhaoran
AU - Hong, Mingyi
N1 - Funding Information:
Acknowledgement & Funding Disclosure The authors would like to thank Mr. Alan Lun (CUHK) for conducting the preliminary numerical experiments in this paper. H.-T. Wai is supported by the CUHK Direct Grant #4055113. M. Hong is supported in part by NSF under Grant CCF-1651825, CMMI-172775, CIF-1910385 and by AFOSR under grant 19RT0424.
Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - This paper studies a gradient temporal difference (GTD) algorithm using neural network (NN) function approximators to minimize the mean squared Bellman error (MSBE). For off-policy learning, we show that the minimum MSBE problem can be recast into a min-max optimization involving a pair of over-parameterized primal-dual NNs. The resultant formulation can then be tackled using a neural GTD algorithm. We analyze the convergence of the proposed algorithm with a 2-layer ReLU NN architecture using m neurons and prove that it computes an approximate optimal solution to the minimum MSBE problem as m ! 1.
AB - This paper studies a gradient temporal difference (GTD) algorithm using neural network (NN) function approximators to minimize the mean squared Bellman error (MSBE). For off-policy learning, we show that the minimum MSBE problem can be recast into a min-max optimization involving a pair of over-parameterized primal-dual NNs. The resultant formulation can then be tackled using a neural GTD algorithm. We analyze the convergence of the proposed algorithm with a 2-layer ReLU NN architecture using m neurons and prove that it computes an approximate optimal solution to the minimum MSBE problem as m ! 1.
UR - http://www.scopus.com/inward/record.url?scp=85108419441&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108419441&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85108419441
SN - 1049-5258
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
Y2 - 6 December 2020 through 12 December 2020
ER -