A Low-Rank Approximation for MDPs via Moment Coupling

Amy B.Z. Zhang*, Itai Gurvich

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

We introduce a framework to approximate Markov decision processes (MDPs) that stands on two pillars: (i) state aggregation, as the algorithmic infrastructure, and (ii) central-limit-theorem-type approximations, as the mathematical underpinning of optimality guarantees. The theory is grounded in recent work by Braverman et al. [Braverman A, Gurvich I, Huang J (2020) On the Taylor expansion of value functions. Oper. Res. 68(2):631–65] that relates the solution of the Bellman equation to that of a partial differential equation (PDE) where, in the spirit of the central limit theorem, the transition matrix is reduced to its local first and second moments. Solving the PDE is not required by our method. Instead, we construct a “sister” (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this moment matching, the original chain and its sister are coupled through the PDE, a coupling that facilitates optimality guarantees. Embedded into standard soft aggregation algorithms, moment matching provides a disciplined mechanism to tune the aggregation and disaggregation probabilities. Computational gains arise from the reduction of the effective state space from N to N12+∊ is as one might intuitively expect from approximations grounded in the central limit theorem.

Original languageEnglish (US)
Pages (from-to)1255-1277
Number of pages23
JournalOperations Research
Volume72
Issue number3
DOIs
StatePublished - May 1 2024

Funding

Funding:This work was supported by the National Science Foundation [Grant CMMI-1662294]. Supplemental Material:The online appendix is available at https:/ /doi.org/10.1287/opre.2022.2392 .

Keywords

  • Markov processes
  • algorithm analysis
  • approximate dynamic programming
  • parameter design
  • state aggregation

ASJC Scopus subject areas

  • Computer Science Applications
  • Management Science and Operations Research

Fingerprint

Dive into the research topics of 'A Low-Rank Approximation for MDPs via Moment Coupling'. Together they form a unique fingerprint.

Cite this