A Language-First Approach to Procedure Planning

Jiateng Liu*, Sha Li*, Zhenhailong Wang, Manling Li, Heng Ji

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Procedure planning, or the ability to predict a series of steps that can achieve a given goal conditioned on the current observation, is critical for building intelligent embodied agents that can assist users in everyday tasks. Encouraged by the recent success of language models (LMs) for zero-shot (Huang et al., 2022a; Ahn et al., 2022) and few-shot planning (Micheli and Fleuret, 2021), we hypothesize that LMs may be equipped with stronger priors for planning compared to their visual counterparts. To this end, we propose a language-first procedure planning framework with modularized design: we first align the current and goal observations with corresponding steps and then use a pre-trained LM to predict the intermediate steps. Under this framework, we find that using an image captioning model for alignment can already match state-of-the-art performance and by designing a double retrieval model conditioned over current and goal observations jointly, we can achieve large improvements (19.2% - 98.9% relatively higher success rate than state-of-the-art) on both COIN (Tang et al., 2019) and CrossTask (Zhukov et al., 2019) benchmarks. Our work verifies the planning ability of LMs and demonstrates how LMs can serve as a powerful “reasoning engine” even when the input is provided in another modality.

Original languageEnglish (US)
Title of host publicationFindings of the Association for Computational Linguistics, ACL 2023
PublisherAssociation for Computational Linguistics (ACL)
Pages1941-1954
Number of pages14
ISBN (Electronic)9781959429623
StatePublished - 2023
Event61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada
Duration: Jul 9 2023Jul 14 2023

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Country/TerritoryCanada
CityToronto
Period7/9/237/14/23

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'A Language-First Approach to Procedure Planning'. Together they form a unique fingerprint.

Cite this