Learning intrinsic rewards as a Bi-Level optimization problem

Lunjun Zhang, Bradly C. Stadie, Jimmy Ba

Research output: Contribution to conferencePaperpeer-review

8 Scopus citations

Abstract

We reinterpret the problem of finding intrinsic rewards in reinforcement learning (RL) as a bilevel optimization problem. Using this interpretation, we can make use of recent advancements in the hyperparameter optimization literature, mainly from Self-Tuning Networks (STN), to learn intrinsic rewards. To facilitate our methods, we introduces a new general conditioning layer: Conditional Layer Normalization (CLN). We evaluate our method on several continuous control benchmarks in the Mujoco physics simulator. On all of these benchmarks, the intrinsic rewards learned on the fly lead to higher final rewards.

Original languageEnglish (US)
Pages111-120
Number of pages10
StatePublished - 2020
Event36th Conference on Uncertainty in Artificial Intelligence, UAI 2020 - Virtual, Online
Duration: Aug 3 2020Aug 6 2020

Conference

Conference36th Conference on Uncertainty in Artificial Intelligence, UAI 2020
CityVirtual, Online
Period8/3/208/6/20

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Learning intrinsic rewards as a Bi-Level optimization problem'. Together they form a unique fingerprint.

Cite this