Adaptive Discretization for Episodic Reinforcement Learning in Metric Spaces

Sean R. Sinclair*, Siddhartha Banerjee, Christina Lee Yu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

We present an efficient algorithm for model-free episodic reinforcement learning on large (potentially continuous) state-action spaces. Our algorithm is based on a novel Q-learning policy with adaptive data-driven discretization. The central idea is to maintain a finer partition of the state-action space in regions which are frequently visited in historical trajectories, and have higher payoff estimates. We demonstrate how our adaptive partitions take advantage of the shape of the optimal Q-function and the joint space, without sacrificing the worst-case performance. In particular, we recover the regret guarantees of prior algorithms for continuous state-action spaces, which additionally require either an optimal discretization as input, and/or access to a simulation oracle. Moreover, experiments demonstrate how our algorithm automatically adapts to the underlying structure of the problem, resulting in much better performance compared both to heuristics and Q-learning with uniform discretization.

Original languageEnglish (US)
Pages (from-to)17-18
Number of pages2
JournalPerformance Evaluation Review
Volume48
Issue number1
DOIs
StatePublished - Jul 8 2020

Funding

The full paper is available at https://dl.acm.org/doi/10.1145/3366703. The code for the experiments is available at https://github.com/seanrsinclair/AdaptiveQLearning. We gratefully acknowledge funding from the NSF under grants ECCS-1847393 and DMS-1839346, and the ARL under grant W911NF-17-1-0094.

Keywords

  • adaptive discretization
  • metric spaces
  • model-free
  • qlearning
  • reinforcement learning

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Adaptive Discretization for Episodic Reinforcement Learning in Metric Spaces'. Together they form a unique fingerprint.

Cite this