TY - JOUR
T1 - On the taylor expansion of value functions
AU - Braverman, Anton
AU - Gurvich, Itai
AU - Huang, Junfei
N1 - Funding Information:
Funding: The work of Itai Gurvich is supported by the National Science Foundation [CMMI-1662294]. Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2019.1903.
Publisher Copyright:
© 2020 INFORMS.
PY - 2020/3
Y1 - 2020/3
N2 - We introduce a framework for approximate dynamic programming that we apply to discrete-time chains on ℤd+ with countable action sets. The framework is grounded in the approximation of the (controlled) chain's generator by that of another Markov process. In simple terms, our approach stipulates applying a second-order Taylor expansion to the value function, replacing the Bellman equation with one in continuous space and time in which the transition matrix is reduced to its first and second moments. In some cases, the resulting equation can be interpreted as a Hamilton-Jacobi-Bellman equation for a Brownian control problem. When tractable, the "Taylored"equation serves as a useful modeling tool. More generally, it is a starting point for approximation algorithms. We develop bounds on the optimality gap-the suboptimality introduced by using the control produced by the Taylored equation. These bounds can be viewed as a conceptual underpinning, analytical rather than relying on weak convergence arguments, for the good performance of controls derived from Brownian approximations. We prove that under suitable conditions and for suitably "large"initial states, (1) the optimality gap is smaller than a 1 - α fraction of the optimal value, with which α ∈ (0, 1) is the discount factor, and (2) the gap can be further expressed as the infinite-horizon discounted value with a "lowerorder"per-period reward. Computationally, our framework leads to an "aggregation"approach with performance guarantees. Although the guarantees are grounded in partial differential equation theory, the practical use of this approach requires no knowledge of that theory.
AB - We introduce a framework for approximate dynamic programming that we apply to discrete-time chains on ℤd+ with countable action sets. The framework is grounded in the approximation of the (controlled) chain's generator by that of another Markov process. In simple terms, our approach stipulates applying a second-order Taylor expansion to the value function, replacing the Bellman equation with one in continuous space and time in which the transition matrix is reduced to its first and second moments. In some cases, the resulting equation can be interpreted as a Hamilton-Jacobi-Bellman equation for a Brownian control problem. When tractable, the "Taylored"equation serves as a useful modeling tool. More generally, it is a starting point for approximation algorithms. We develop bounds on the optimality gap-the suboptimality introduced by using the control produced by the Taylored equation. These bounds can be viewed as a conceptual underpinning, analytical rather than relying on weak convergence arguments, for the good performance of controls derived from Brownian approximations. We prove that under suitable conditions and for suitably "large"initial states, (1) the optimality gap is smaller than a 1 - α fraction of the optimal value, with which α ∈ (0, 1) is the discount factor, and (2) the gap can be further expressed as the infinite-horizon discounted value with a "lowerorder"per-period reward. Computationally, our framework leads to an "aggregation"approach with performance guarantees. Although the guarantees are grounded in partial differential equation theory, the practical use of this approach requires no knowledge of that theory.
KW - Approximate dynamic programming
KW - Approximate policy iteration
KW - Hamilton-Jacobi-Bellman equation
KW - Markov decision process
KW - Taylor expansion
UR - http://www.scopus.com/inward/record.url?scp=85089537073&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089537073&partnerID=8YFLogxK
U2 - 10.1287/opre.2019.1903
DO - 10.1287/opre.2019.1903
M3 - Article
AN - SCOPUS:85089537073
SN - 0030-364X
VL - 68
SP - 631
EP - 654
JO - Operations Research
JF - Operations Research
IS - 2
ER -