Abstract
We interpret meta-reinforcement learning as the problem of learning how to quickly find a good sampling distribution in a new environment. This interpretation leads to the development of two new meta-reinforcement learning algorithms: E-MAML and E-RL2. Results are presented on a new environment we call 'Krazy World': a difficult high-dimensional gridworld which is designed to highlight the importance of correctly differentiating through sampling distributions in meta-reinforcement learning. Further results are presented on a set of maze environments. We show E-MAML and E-RL2 deliver better performance than baseline algorithms on both tasks.
Original language | English (US) |
---|---|
Pages (from-to) | 9280-9290 |
Number of pages | 11 |
Journal | Advances in Neural Information Processing Systems |
Volume | 2018-December |
State | Published - 2018 |
Event | 32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada Duration: Dec 2 2018 → Dec 8 2018 |
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Signal Processing