Fully decentralized multi-agent reinforcement learning with networked agents

Kaiqing Zhang*, Zhuoran Yang, Han Liu, Tong Zhang, Tamer Başar

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

101 Scopus citations

Abstract

We consider the fully decentralized multi-agent reinforcement learning (MARL) problem, where the agents are connected via a time-varying and possibly sparse communication network. Specifically, we assume that the reward functions of the agents might correspond to different tasks, and are only known to the corresponding agent. Moreover, each agent makes individual decisions based on both the information observed locally and the messages received from its neighbors over the network. To maximize the globally averaged return over the network, we propose two fully decentralized actor-critic algorithms, which are applicable to large-scale MARL problems in an online fashion. Convergence guarantees are provided when the value functions are approximated within the class of linear functions. Our work appears to be the first theoretical study of fully decentralized MARL algorithms for networked agents that use function approximation.

Original languageEnglish (US)
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsAndreas Krause, Jennifer Dy
PublisherInternational Machine Learning Society (IMLS)
Pages9340-9371
Number of pages32
ISBN (Electronic)9781510867963
StatePublished - 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume13

Other

Other35th International Conference on Machine Learning, ICML 2018
Country/TerritorySweden
CityStockholm
Period7/10/187/15/18

Funding

The work of K.Z. and T.B. was supported in part by the US Army Research Laboratory (ARL) Cooperative Agreement W911NF-17-2-0196, and in part by the US Army Research Office (ARO) Grant W911NF-16-1-0485. The research of H.L. was supported by NSF CAREER Award DMS1454377, NSF IIS1408910, NSF IIS1332109. This material is based upon work supported by the National Science Foundation under grant no. 1740762 "Collaborative Research: TRIPODS Institute for Optimization and Learning". We would also like to thank all the anonymous reviewers for their helpful suggestions and supportive comments.

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Fully decentralized multi-agent reinforcement learning with networked agents'. Together they form a unique fingerprint.

Cite this