Abstract
We study reinforcement learning for partially observable multi-agent systems where each agent only has access to its own observation and reward and aims to maximize its cumulative rewards. To handle partial observations, we propose graph-assisted predictive state representations (GAPSR), a scalable multi-agent representation learning framework that leverages the agent connectivity graphs to aggregate local representations computed by each agent. In addition, our representations are readily able to incorporate dynamic interaction graphs and kernel space embeddings of the predictive states, and thus have strong flexibility and representation power. Based on GAPSR, we propose an end-to-end MARL algorithm that simultaneously infers the predictive representations and uses the representations as the input of a policy optimization algorithm. Empirically, we demonstrate the efficacy of the proposed algorithm provided on both a MAMuJoCo robotic learning experiment and a multi-agent particle learning environment.
Original language | English (US) |
---|---|
State | Published - 2022 |
Event | 10th International Conference on Learning Representations, ICLR 2022 - Virtual, Online Duration: Apr 25 2022 → Apr 29 2022 |
Conference
Conference | 10th International Conference on Learning Representations, ICLR 2022 |
---|---|
City | Virtual, Online |
Period | 4/25/22 → 4/29/22 |
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language