Abstract
Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). Our HL algorithms achieve data efficiency by leveraging a key insight: having samples of the exogenous variables, past decisions can be revisited in hindsight to infer counterfactual consequences that can accelerate policy improvements. We compare HL against classic baselines in the multi-secretary and airline revenue management problems. We also scale our algorithms to a business-critical cloud resource management problem - allocating Virtual Machines (VMs) to physical machines, and simulate their performance with real datasets from a large public cloud provider. We find that HL algorithms outperform domain-specific heuristics, as well as state-of-the-art reinforcement learning methods.
Original language | English (US) |
---|---|
Pages (from-to) | 31877-31914 |
Number of pages | 38 |
Journal | Proceedings of Machine Learning Research |
Volume | 202 |
State | Published - 2023 |
Event | 40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States Duration: Jul 23 2023 → Jul 29 2023 |
Funding
We thank Janardhan Kulkarni, Beibin Li, Connor Lawless, Siddhartha Banerjee, and Christina Yu for inspiring discussions. We thank Dhivya Eswaran, Tara Safavi, and Tobias Schnabel for reviewing early drafts. Part of this work was done while Sean Sinclair and Jingling Li were research interns at Microsoft Research, and while Sean Sinclair was a visitor at Simons Institute for the semester on Data-Driven Decision Processes program. We gratefully acknowledge funding from the National Science Foundation under grants ECCS-1847393, DMS-1839346, CCF-1948256, CNS-195599, and CNS-1955997, the Air Force Office of Scientific Research under grant FA9550-23-1-0068, and the Army Research Laboratory under grants W911NF-19-1-0217 and W911NF-17-1-0094.
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability