Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee*

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review


Policy optimization methods with function approximation are widely used in multi-agent reinforcement learning. However, it remains elusive how to design such algorithms with statistical guarantees. Leveraging a multi-agent performance difference lemma that characterizes the landscape of multi-agent policy optimization, we find that the localized action value function serves as an ideal descent direction for each local policy. Motivated by the observation, we present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO. We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate. We extend our algorithm to the off-policy setting and introduce pessimism to policy evaluation, which aligns with experiments. To our knowledge, this is the first provably convergent multi-agent PPO algorithm in cooperative Markov games.

Original languageEnglish (US)
Pages (from-to)42200-42226
Number of pages27
JournalProceedings of Machine Learning Research
StatePublished - 2023
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: Jul 23 2023Jul 29 2023

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability


Dive into the research topics of 'Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning'. Together they form a unique fingerprint.

Cite this