Real-Time Rideshare Driver Supply Values Using Online Reinforcement Learning

Benjamin Han, Hyungjun Lee, Sébastien Martin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

In this paper, we present Online Supply Values (OSV), a system for estimating the return of available rideshare drivers to match drivers to ride requests at Lyft. Because a future driver state can be accurately predicted from a request destination, it is possible to estimate the expected action value of assigning a ride request to an available driver as a Markov Decision Process using the Bellman Equation. These estimates are updated using temporal difference and are shown to adapt to changing marketplace conditions in real-time. While reinforcement learning has been studied for rideshare dispatch, fully-online approaches without offline priors or other guardrails had never been evaluated in the real world. This work presents the algorithmic changes needed to bridge this gap. OSV is now deployed globally as a core component of Lyft's dispatch matching system. Our A/B user experiments in major US cities measure a +(0.96±0.53)% increase in the request fulfillment rate and a +(0.73±0.22)% increase to profit per passenger session over the previous algorithm.

Original languageEnglish (US)
Title of host publicationKDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages2968-2976
Number of pages9
ISBN (Electronic)9781450393850
DOIs
StatePublished - Aug 14 2022
Event28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022 - Washington, United States
Duration: Aug 14 2022Aug 18 2022

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022
Country/TerritoryUnited States
CityWashington
Period8/14/228/18/22

Keywords

  • adaptive
  • dispatch
  • matching
  • multi-agent reinforcement learning
  • on-policy control
  • online learning
  • real-time
  • rideshare
  • streaming
  • temporal difference
  • transportation

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Real-Time Rideshare Driver Supply Values Using Online Reinforcement Learning'. Together they form a unique fingerprint.

Cite this