Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations

Feng Gao*, Liangzhi Shi, Shenao Zhang, Zhaoran Wang, Yi Wu*

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Recent advancements in differentiable simulators highlight the potential of policy optimization using simulation gradients. Yet, these approaches are largely contingent on the continuity and smoothness of the simulation, which precludes the use of certain simulation engines, such as Mujoco. To tackle this challenge, we introduce the adaptive analytic gradient. This method views the Q function as a surrogate for future returns, consistent with the Bellman equation. By analyzing the variance of batched gradients, our method can autonomously opt for a more resilient Q function to compute the gradient when encountering rough simulation transitions. We also put forth the Adaptive-Gradient Policy Optimization (AGPO) algorithm, which leverages our proposed method for policy learning. On the theoretical side, we demonstrate AGPO's convergence, emphasizing its stable performance under non-smooth dynamics due to low variance. On the empirical side, our results show that AGPO effectively mitigates the challenges posed by non-smoothness in policy learning through differentiable simulation.

Original languageEnglish (US)
Pages (from-to)14844-14858
Number of pages15
JournalProceedings of Machine Learning Research
Volume235
StatePublished - 2024
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: Jul 21 2024Jul 27 2024

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations'. Together they form a unique fingerprint.

Cite this