Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach

Yingjie Fei*, Zhuoran Yang, Zhaoran Wang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

31 Scopus citations

Abstract

We study function approximation for episodic reinforcement learning with entropic risk measure. We first propose an algorithm with linear function approximation. Compared to existing algorithms, which suffer from improper regularization and regression biases, this algorithm features debiasing transformations in backward induction and regression procedures. We further propose an algorithm with general function approximation, which is shown to perform implicit debiasing transformations. We prove that both algorithms achieve a sublinear regret and demonstrate a tradeoff between generality and efficiency. Our analysis provides a unified framework for function approximation in risk-sensitive reinforcement learning, which leads to the first sub-linear regret bounds in the setting.

Original languageEnglish (US)
Title of host publicationProceedings of the 38th International Conference on Machine Learning, ICML 2021
PublisherML Research Press
Pages3198-3207
Number of pages10
ISBN (Electronic)9781713845065
StatePublished - 2021
Event38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Duration: Jul 18 2021Jul 24 2021

Publication series

NameProceedings of Machine Learning Research
Volume139
ISSN (Electronic)2640-3498

Conference

Conference38th International Conference on Machine Learning, ICML 2021
CityVirtual, Online
Period7/18/217/24/21

Funding

We thank the reviewers for their constructive feedback. Z. Yang acknowledges Simons Institute (Theory of Reinforcement Learning). Z. Wang acknowledges National Science Foundation (Awards 2048075, 2008827, 2015568, 1934931), Simons Institute (Theory of Reinforcement Learning), Amazon, J.P. Morgan, and Two Sigma for their supports.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach'. Together they form a unique fingerprint.

Cite this