Bandit learning in decentralized matching markets

Lydia T. Liu, Feng Ruan, Horia Mania, Michael I. Jordan

Research output: Contribution to journalArticlepeer-review

40 Scopus citations

Abstract

We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition. We introduce a new algorithm for this setting that, over a time horizon T, attains O(log(T)) stable regret when preferences of the arms over players are shared, and O(log(T)2) regret when there are no assumptions on the preferences on either side. Moreover, in the setting where a single player may deviate, we show that the algorithm is incentive compatible whenever the arms’ preferences are shared, but not necessarily so when preferences are fully general.

Original languageEnglish (US)
JournalJournal of Machine Learning Research
Volume22
StatePublished - 2021

Keywords

  • Multi-armed bandits
  • Online learning
  • Stable matching
  • Two-sided markets

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Bandit learning in decentralized matching markets'. Together they form a unique fingerprint.

Cite this