Multi-agent reinforcement learning via double averaging primal-dual optimization

Hoi To Wai, Zhuoran Yang, Mingyi Hong, Zhaoran Wang

Research output: Contribution to journalConference article

Abstract

Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents. Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, we study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn the value of a given policy. In this paper, we propose a double averaging scheme, where each agent iteratively performs averaging over both space and time to incorporate neighboring gradient information and local reward information, respectively. We prove that the proposed algorithm converges to the optimal solution at a global geometric rate. In particular, such an algorithm is built upon a primal-dual reformulation of the mean squared projected Bellman error minimization problem, which gives rise to a decentralized convex-concave saddle-point problem. To the best of our knowledge, the proposed double averaging primal-dual optimization algorithm is the first to achieve fast finite-time convergence on decentralized convex-concave saddle-point problems.

Original languageEnglish (US)
Pages (from-to)9649-9660
Number of pages12
JournalAdvances in Neural Information Processing Systems
Volume2018-December
StatePublished - Jan 1 2018
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: Dec 2 2018Dec 8 2018

Fingerprint

Reinforcement learning
Sensor networks
Robotics

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

@article{08c740a1ea334985be1997d5a67cc9c9,
title = "Multi-agent reinforcement learning via double averaging primal-dual optimization",
abstract = "Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents. Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, we study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn the value of a given policy. In this paper, we propose a double averaging scheme, where each agent iteratively performs averaging over both space and time to incorporate neighboring gradient information and local reward information, respectively. We prove that the proposed algorithm converges to the optimal solution at a global geometric rate. In particular, such an algorithm is built upon a primal-dual reformulation of the mean squared projected Bellman error minimization problem, which gives rise to a decentralized convex-concave saddle-point problem. To the best of our knowledge, the proposed double averaging primal-dual optimization algorithm is the first to achieve fast finite-time convergence on decentralized convex-concave saddle-point problems.",
author = "Wai, {Hoi To} and Zhuoran Yang and Mingyi Hong and Zhaoran Wang",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
volume = "2018-December",
pages = "9649--9660",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",

}

Multi-agent reinforcement learning via double averaging primal-dual optimization. / Wai, Hoi To; Yang, Zhuoran; Hong, Mingyi; Wang, Zhaoran.

In: Advances in Neural Information Processing Systems, Vol. 2018-December, 01.01.2018, p. 9649-9660.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Multi-agent reinforcement learning via double averaging primal-dual optimization

AU - Wai, Hoi To

AU - Yang, Zhuoran

AU - Hong, Mingyi

AU - Wang, Zhaoran

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents. Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, we study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn the value of a given policy. In this paper, we propose a double averaging scheme, where each agent iteratively performs averaging over both space and time to incorporate neighboring gradient information and local reward information, respectively. We prove that the proposed algorithm converges to the optimal solution at a global geometric rate. In particular, such an algorithm is built upon a primal-dual reformulation of the mean squared projected Bellman error minimization problem, which gives rise to a decentralized convex-concave saddle-point problem. To the best of our knowledge, the proposed double averaging primal-dual optimization algorithm is the first to achieve fast finite-time convergence on decentralized convex-concave saddle-point problems.

AB - Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents. Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, we study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn the value of a given policy. In this paper, we propose a double averaging scheme, where each agent iteratively performs averaging over both space and time to incorporate neighboring gradient information and local reward information, respectively. We prove that the proposed algorithm converges to the optimal solution at a global geometric rate. In particular, such an algorithm is built upon a primal-dual reformulation of the mean squared projected Bellman error minimization problem, which gives rise to a decentralized convex-concave saddle-point problem. To the best of our knowledge, the proposed double averaging primal-dual optimization algorithm is the first to achieve fast finite-time convergence on decentralized convex-concave saddle-point problems.

UR - http://www.scopus.com/inward/record.url?scp=85064807489&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064807489&partnerID=8YFLogxK

M3 - Conference article

VL - 2018-December

SP - 9649

EP - 9660

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -