Abstract
We study function approximation for episodic reinforcement learning with entropic risk measure. We first propose an algorithm with linear function approximation. Compared to existing algorithms, which suffer from improper regularization and regression biases, this algorithm features debiasing transformations in backward induction and regression procedures. We further propose an algorithm with general function approximation, which is shown to perform implicit debiasing transformations. We prove that both algorithms achieve a sublinear regret and demonstrate a tradeoff between generality and efficiency. Our analysis provides a unified framework for function approximation in risk-sensitive reinforcement learning, which leads to the first sub-linear regret bounds in the setting.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 38th International Conference on Machine Learning, ICML 2021 |
Publisher | ML Research Press |
Pages | 3198-3207 |
Number of pages | 10 |
ISBN (Electronic) | 9781713845065 |
State | Published - 2021 |
Event | 38th International Conference on Machine Learning, ICML 2021 - Virtual, Online Duration: Jul 18 2021 → Jul 24 2021 |
Publication series
Name | Proceedings of Machine Learning Research |
---|---|
Volume | 139 |
ISSN (Electronic) | 2640-3498 |
Conference
Conference | 38th International Conference on Machine Learning, ICML 2021 |
---|---|
City | Virtual, Online |
Period | 7/18/21 → 7/24/21 |
Funding
We thank the reviewers for their constructive feedback. Z. Yang acknowledges Simons Institute (Theory of Reinforcement Learning). Z. Wang acknowledges National Science Foundation (Awards 2048075, 2008827, 2015568, 1934931), Simons Institute (Theory of Reinforcement Learning), Amazon, J.P. Morgan, and Two Sigma for their supports.
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability