Cold Diffusion on the Replay Buffer: Learning to Plan from Known Good States

Zidan Wang, Takeru Oba, Takuma Yoneda, Rui Shen, Matthew R. Walter, Bradly Stadie

Research output: Contribution to journalConference articlepeer-review

Abstract

Learning from demonstrations (LfD) has successfully trained robots to exhibit remarkable generalization capabilities. However, many powerful imitation techniques do not prioritize the feasibility of the robot behaviors they generate. In this work, we explore the feasibility of plans produced by LfD. As in prior work, we employ a temporal diffusion model with fixed start and goal states to facilitate imitation through in-painting. Unlike previous studies, we apply cold diffusion to ensure the optimization process is directed through the agent's replay buffer of previously visited states. This routing approach increases the likelihood that the final trajectories will predominantly occupy the feasible region of the robot's state space. We test this method in simulated robotic environments with obstacles and observe a significant improvement in the agent's ability to avoid these obstacles during planning.

Original languageEnglish (US)
JournalProceedings of Machine Learning Research
Volume229
StatePublished - 2023
Event7th Conference on Robot Learning, CoRL 2023 - Atlanta, United States
Duration: Nov 6 2023Nov 9 2023

Funding

We would like to thank David Yunis for providing crucial advice on making the implementation of our forward diffusion process tremendously faster through vectorization. We also express our gratitude to our colleagues and reviewers for their invaluable feedback and insights that significantly improved this paper. This work was supported in part by the National Science Foundation under HDR TRIPODS (No. 2216899).

Keywords

  • Diffusion
  • Imitation Learning
  • Planning
  • Safety

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Cold Diffusion on the Replay Buffer: Learning to Plan from Known Good States'. Together they form a unique fingerprint.

Cite this