TY - GEN
T1 - Maximum entropy gain exploration for long horizon multi-goal reinforcement learning
AU - Pitis, Silviu
AU - Chan, Harris
AU - Zhao, Stephen
AU - Stadie, Bradly
AU - Ba, Jimmy
N1 - Funding Information:
We thank Sheng Jia, Michael R. Zhang and the anonymous reviewers for their helpful comments. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute (https://vectorinstitute.ai/#partners).
Publisher Copyright:
© 2020 37th International Conference on Machine Learning, ICML 2020. All rights reserved.
PY - 2020
Y1 - 2020
N2 - What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? When the desired (test time) goal distribution is too distant to offer a useful learning signal, we argue that the agent should not pursue unobtainable goals. Instead, it should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution. We propose to optimize this objective by having the agent pursue past achieved goals in sparsely explored areas of the goal space, which focuses exploration on the frontier of the achievable goal set. We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks including maze navigation and block stacking.1
AB - What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? When the desired (test time) goal distribution is too distant to offer a useful learning signal, we argue that the agent should not pursue unobtainable goals. Instead, it should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution. We propose to optimize this objective by having the agent pursue past achieved goals in sparsely explored areas of the goal space, which focuses exploration on the frontier of the achievable goal set. We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks including maze navigation and block stacking.1
UR - http://www.scopus.com/inward/record.url?scp=85105304010&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105304010&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85105304010
T3 - 37th International Conference on Machine Learning, ICML 2020
SP - 7706
EP - 7717
BT - 37th International Conference on Machine Learning, ICML 2020
A2 - Daume, Hal
A2 - Singh, Aarti
PB - International Machine Learning Society (IMLS)
T2 - 37th International Conference on Machine Learning, ICML 2020
Y2 - 13 July 2020 through 18 July 2020
ER -