Biological implementation of the temporal difference algorithm for reinforcement learning: Theoretical comment on O'Reilly et al. (2007)

James C. Houk*

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

Abstract

The ability to survive in the world depends critically on the brain's capacity to detect earlier and earlier predictors of reward or punishment. The dominant theoretical perspective for understanding this capacity has been the temporal difference (TD) algorithm for reinforcement learning. In this issue of Behavioral Neuroscience, R. C. O'Reilly, M. J. Frank, T. E. Hazy, and B. Watz (2007, see record 2007-02025-004)) propose a new model dubbed primary value and learned value (PVLV) that is simpler than TD, and they claimed that it is biologically more realistic. In this commentary, the author suggests some slight modifications of a previous biological implementation of TD instead of adopting the new PVLV algorithm.

Original languageEnglish (US)
Pages (from-to)231-232
Number of pages2
JournalBehavioral Neuroscience
Volume121
Issue number1
DOIs
StatePublished - Feb 2007

Keywords

  • Basal ganglia
  • Credit assignment
  • Dopamine
  • Reinforcement learning
  • Temporal difference

ASJC Scopus subject areas

  • Behavioral Neuroscience

Fingerprint Dive into the research topics of 'Biological implementation of the temporal difference algorithm for reinforcement learning: Theoretical comment on O'Reilly et al. (2007)'. Together they form a unique fingerprint.

Cite this