Abstract
We reinterpret the problem of finding intrinsic rewards in reinforcement learning (RL) as a bilevel optimization problem. Using this interpretation, we can make use of recent advancements in the hyperparameter optimization literature, mainly from Self-Tuning Networks (STN), to learn intrinsic rewards. To facilitate our methods, we introduces a new general conditioning layer: Conditional Layer Normalization (CLN). We evaluate our method on several continuous control benchmarks in the Mujoco physics simulator. On all of these benchmarks, the intrinsic rewards learned on the fly lead to higher final rewards.
Original language | English (US) |
---|---|
Pages (from-to) | 111-120 |
Number of pages | 10 |
Journal | Proceedings of Machine Learning Research |
Volume | 124 |
State | Published - 2020 |
Event | 36th Conference on Uncertainty in Artificial Intelligence, UAI 2020 - Virtual, Online Duration: Aug 3 2020 → Aug 6 2020 |
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability