Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics

Shenao Zhang, Wanxin Jin, Zhaoran Wang

Research output: Contribution to journalConference articlepeer-review

4 Scopus citations

Abstract

Differentiable physics-based simulators have witnessed remarkable success in robot learning involving contact dynamics, benefiting from their improved accuracy and efficiency in solving the underlying complementarity problem. However, when utilizing the First-Order Policy Gradient (FOPG) method, our theory indicates that the complementarity-based systems suffer from stiffness, leading to an explosion in the gradient variance of FOPG. As a result, optimization becomes challenging due to chaotic and non-smooth loss landscapes. To tackle this issue, we propose a novel approach called Adaptive Barrier Smoothing (ABS), which introduces a class of softened complementarity systems that correspond to barrier-smoothed objectives. With a contact-aware adaptive central-path parameter, ABS reduces the FOPG gradient variance while controlling the gradient bias. We justify the adaptive design by analyzing the roots of the system's stiffness. Additionally, we establish the convergence of FOPG and show that ABS achieves a reasonable trade-off between the gradient variance and bias by providing their upper bounds. Moreover, we present a variant of FOPG based on complementarity modeling that efficiently fits the contact dynamics by learning the physical parameters. Experimental results on various robotic tasks are provided to support our theory and method.

Original languageEnglish (US)
Pages (from-to)41219-41243
Number of pages25
JournalProceedings of Machine Learning Research
Volume202
StatePublished - 2023
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: Jul 23 2023Jul 29 2023

Funding

Zhaoran Wang acknowledges National Science Foundation (Awards 2225087, 2211210, 2048075, 2015568, 2008827, 1934931/2216970), Simons Institute (Theory of Reinforcement Learning), Amazon, J.P. Morgan, Two Sigma, Tencent for their supports.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics'. Together they form a unique fingerprint.

Cite this