Gap-Dependent Bounds for Two-Player Markov Games

Zehao Dou, Zhuoran Yang, Zhaoran Wang, Simon S. Du

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations


As one of the most popular methods in the field of reinforcement learning, Q-learning has received increasing attention. Recently, there have been more theoretical works on the regret bound of algorithms that belong to the Q-learning class in different settings. In this paper, we analyze the cumulative regret when conducting Nash Q-learning algorithm on 2-player turn-based stochastic Markov games (2-TBSG), and propose the first gap dependent logarithmic upper bounds in the episodic tabular setting. This bound matches the lower bound only up to a horizon term. Furthermore, we extend the conclusion to the discounted game setting with infinite horizon and propose a similar gap dependent logarithmic regret bound. In addition, under the linear MDP assumption, we obtain another logarithmic regret for 2-TBSG, in both centralized and independent settings.

Original languageEnglish (US)
Pages (from-to)432-455
Number of pages24
JournalProceedings of Machine Learning Research
StatePublished - 2022
Event25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022 - Virtual, Online, Spain
Duration: Mar 28 2022Mar 30 2022

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability


Dive into the research topics of 'Gap-Dependent Bounds for Two-Player Markov Games'. Together they form a unique fingerprint.

Cite this