An information-theoretic analysis of Thompson sampling

Daniel Russo, Benjamin Van Roy

Research output: Contribution to journalArticlepeer-review

87 Scopus citations

Abstract

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.

Original languageEnglish (US)
JournalJournal of Machine Learning Research
Volume17
StatePublished - Apr 1 2016

Keywords

  • Information theory
  • Mutli-armed bandit
  • Online optimization
  • Regret bounds
  • Thompson sampling

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'An information-theoretic analysis of Thompson sampling'. Together they form a unique fingerprint.

Cite this