Distributed MARL for Scheduling in Conflict Graphs

Yiming Zhang*, Dongning Guo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

This paper addresses a link scheduling problem in networks represented by conflict graphs using a distributed learning approach. Each agent in a network controls a single link and has access only to its own state and the states of links in its neighborhood. The goal is to minimize the average packet delay in the multi-agent network. The problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP). The proposed solution adopts a centralized training and distributed execution paradigm and leverages an on-policy reinforcement learning algorithm. Specifically, the paper employs the multi-agent proximal policy optimization (MAPPO) algorithm with judiciously designed recurrent structures in the neural network. The proposed solution is shown to outperform some widely used schedulers in terms of throughput and delays through simulations.

Original languageEnglish (US)
Title of host publication2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350328141
DOIs
StatePublished - 2023
Event59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023 - Monticello, United States
Duration: Sep 26 2023Sep 29 2023

Publication series

Name2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023

Conference

Conference59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023
Country/TerritoryUnited States
CityMonticello
Period9/26/239/29/23

Funding

The work was supported in part by the National Science Foundation under grant No. 2003098, a gift from Intel Corporation, the SpectrumX Center (NSF grant No. 2132700), and also the Institute for Data, Econometrics, Algorithms and Learning (NSF grant No. 2216970). This work was supported in part by NSF under Contract Nos. CNS-1719384 and IIS-1636772.

Keywords

  • Decentralized partially observable Markov decision process (Dec-POMDP)
  • dynamic traffic
  • multi-agent reinforcement learning (MARL)
  • recurrent neural networks
  • wireless networks

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Computer Science Applications
  • Computational Mathematics
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Distributed MARL for Scheduling in Conflict Graphs'. Together they form a unique fingerprint.

Cite this