Learning Intrinsic Rewards as a Bi-Level Optimization Problem

Lunjun Zhang, Bradly C. Stadie, Jimmy Ba

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

We reinterpret the problem of finding intrinsic rewards in reinforcement learning (RL) as a bilevel optimization problem. Using this interpretation, we can make use of recent advancements in the hyperparameter optimization literature, mainly from Self-Tuning Networks (STN), to learn intrinsic rewards. To facilitate our methods, we introduces a new general conditioning layer: Conditional Layer Normalization (CLN). We evaluate our method on several continuous control benchmarks in the Mujoco physics simulator. On all of these benchmarks, the intrinsic rewards learned on the fly lead to higher final rewards.

Original languageEnglish (US)
Pages (from-to)111-120
Number of pages10
JournalProceedings of Machine Learning Research
Volume124
StatePublished - 2020
Event36th Conference on Uncertainty in Artificial Intelligence, UAI 2020 - Virtual, Online
Duration: Aug 3 2020Aug 6 2020

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Learning Intrinsic Rewards as a Bi-Level Optimization Problem'. Together they form a unique fingerprint.

Cite this