THIRD-PERSON IMITATION LEARNING

Bradly C. Stadie, Pieter Abbeel, Ilya Sutskever

Research output: Contribution to conferencePaperpeer-review

146 Scopus citations

Abstract

Reinforcement learning (RL) makes it possible to train agents capable of achieving sophisticated goals in complex and uncertain environments. A key difficulty in reinforcement learning is specifying a reward function for the agent to optimize. Traditionally, imitation learning in RL has been used to overcome this problem. Unfortunately, hitherto imitation learning methods tend to require that demonstrations are supplied in the first-person: the agent is provided with a sequence of states and a specification of the actions that it should have taken. While powerful, this kind of imitation learning is limited by the relatively hard problem of collecting first-person demonstrations. Humans address this problem by learning from third-person demonstrations: they observe other humans perform tasks, infer the task, and accomplish the same task themselves. In this paper, we present a method for unsupervised third-person imitation learning. Here third-person refers to training an agent to correctly achieve a simple goal in a simple environment when it is provided a demonstration of a teacher achieving the same goal but from a different viewpoint; and unsupervised refers to the fact that the agent receives only these third-person demonstrations, and is not provided a correspondence between teacher states and student states. Our methods primary insight is that recent advances from domain confusion can be utilized to yield domain agnostic features which are crucial during the training process. To validate our approach, we report successful experiments on learning from third-person demonstrations in a pointmass domain, a reacher domain, and inverted pendulum.

Original languageEnglish (US)
StatePublished - 2017
Event5th International Conference on Learning Representations, ICLR 2017 - Toulon, France
Duration: Apr 24 2017Apr 26 2017

Conference

Conference5th International Conference on Learning Representations, ICLR 2017
Country/TerritoryFrance
CityToulon
Period4/24/174/26/17

Funding

This work was done partially at OpenAI and partially at Berkeley. Work done at Berkeley was supported in part by Darpa under the Simplex program and the FunLoL program.

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'THIRD-PERSON IMITATION LEARNING'. Together they form a unique fingerprint.

Cite this