TY - GEN
T1 - Feedback-based tree search for reinforcement learning
AU - Jiang, Daniel R.
AU - Ekwedike, Emmanuel
AU - Liu, Han
N1 - Funding Information:
We wish to thank four anonymous reviewers, whose feedback helped to significantly improve the paper. We also thank our colleagues at Tencent AI Lab, particularly Carson Eisenach and Xiangru Lian, for technical help. Daniel Jiang is grateful for the support from Tencent AI Lab through a faculty award. The research of Han Liu was supported by NSF CAREER Award DMS1454377, NSF IIS1408910, and NSF IIS1332109. This material is also based upon work supported by the National Science Foundation under grant no. 1740762 "Collaborative Research: TRIPODS Institute for Optimization and Learning."
Publisher Copyright:
© 2018 by authors.All right reserved.
PY - 2018
Y1 - 2018
N2 - Inspired by recent successes of Monte-Carlo tree search (MCTS) in a number of artificial intelligence (AI) application domains, we propose a reinforcement learning (RL) technique that iteratively applies MCTS on batches of small, finitehorizon versions of the original infinite-horizon Markov decision process. The terminal condition of the finite-horizon problems, or the leaf-node evaluator of the decision tree generated by MCTS, is specified using a combination of an estimated value function and an estimated policy function. The recommendations generated by the MCTS procedure are then provided as feedback in order to refine, through classification and regression, the leaf-node evaluator for the next iteration. We provide the first sample complexity bounds for a tree search-based RL algorithm. In addition, we show that a deep neural network implementation of the technique can create a competitive AI agent for the popular multi-player online battle arena (MOBA) game King of Glory.
AB - Inspired by recent successes of Monte-Carlo tree search (MCTS) in a number of artificial intelligence (AI) application domains, we propose a reinforcement learning (RL) technique that iteratively applies MCTS on batches of small, finitehorizon versions of the original infinite-horizon Markov decision process. The terminal condition of the finite-horizon problems, or the leaf-node evaluator of the decision tree generated by MCTS, is specified using a combination of an estimated value function and an estimated policy function. The recommendations generated by the MCTS procedure are then provided as feedback in order to refine, through classification and regression, the leaf-node evaluator for the next iteration. We provide the first sample complexity bounds for a tree search-based RL algorithm. In addition, we show that a deep neural network implementation of the technique can create a competitive AI agent for the popular multi-player online battle arena (MOBA) game King of Glory.
UR - http://www.scopus.com/inward/record.url?scp=85057222494&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057222494&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85057222494
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 3572
EP - 3590
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Dy, Jennifer
A2 - Krause, Andreas
PB - International Machine Learning Society (IMLS)
T2 - 35th International Conference on Machine Learning, ICML 2018
Y2 - 10 July 2018 through 15 July 2018
ER -