Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless Networks

Yasar Sinan Nasir*, Dongning Guo

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

390 Scopus citations


This work demonstrates the potential of deep reinforcement learning techniques for transmit power control in wireless networks. Existing techniques typically find near-optimal power allocations by solving a challenging optimization problem. Most of these algorithms are not scalable to large networks in real-world scenarios because of their computational complexity and instantaneous cross-cell channel state information (CSI) requirement. In this paper, a distributively executed dynamic power allocation scheme is developed based on model-free deep reinforcement learning. Each transmitter collects CSI and quality of service (QoS) information from several neighbors and adapts its own transmit power accordingly. The objective is to maximize a weighted sum-rate utility function, which can be particularized to achieve maximum sum-rate or proportionally fair scheduling. Both random variations and delays in the CSI are inherently addressed using deep Q -learning. For a typical network architecture, the proposed algorithm is shown to achieve near-optimal power allocation in real time based on delayed CSI measurements available to the agents. The proposed scheme is especially suitable for practical scenarios where the system model is inaccurate and CSI delay is non-negligible.

Original languageEnglish (US)
Article number8792117
Pages (from-to)2239-2250
Number of pages12
JournalIEEE Journal on Selected Areas in Communications
Issue number10
StatePublished - Oct 2019


  • Deep Q-learning
  • Jakes fading model
  • interference mitigation
  • power control
  • radio resource management

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless Networks'. Together they form a unique fingerprint.

Cite this