Hindsight Learning for MDPs with Exogenous Inputs

Sean R. Sinclair*, Felipe Frujeri, Ching An Cheng, Luke Marshall, Hugo Barbalho, Jingling Li, Jennifer Neville, Ishai Menache, Adith Swaminathan*

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). Our HL algorithms achieve data efficiency by leveraging a key insight: having samples of the exogenous variables, past decisions can be revisited in hindsight to infer counterfactual consequences that can accelerate policy improvements. We compare HL against classic baselines in the multi-secretary and airline revenue management problems. We also scale our algorithms to a business-critical cloud resource management problem - allocating Virtual Machines (VMs) to physical machines, and simulate their performance with real datasets from a large public cloud provider. We find that HL algorithms outperform domain-specific heuristics, as well as state-of-the-art reinforcement learning methods.

Original languageEnglish (US)
Pages (from-to)31877-31914
Number of pages38
JournalProceedings of Machine Learning Research
Volume202
StatePublished - 2023
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: Jul 23 2023Jul 29 2023

Funding

We thank Janardhan Kulkarni, Beibin Li, Connor Lawless, Siddhartha Banerjee, and Christina Yu for inspiring discussions. We thank Dhivya Eswaran, Tara Safavi, and Tobias Schnabel for reviewing early drafts. Part of this work was done while Sean Sinclair and Jingling Li were research interns at Microsoft Research, and while Sean Sinclair was a visitor at Simons Institute for the semester on Data-Driven Decision Processes program. We gratefully acknowledge funding from the National Science Foundation under grants ECCS-1847393, DMS-1839346, CCF-1948256, CNS-195599, and CNS-1955997, the Air Force Office of Scientific Research under grant FA9550-23-1-0068, and the Army Research Laboratory under grants W911NF-19-1-0217 and W911NF-17-1-0094.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Hindsight Learning for MDPs with Exogenous Inputs'. Together they form a unique fingerprint.

Cite this