TY - JOUR
T1 - An Ergodic Measure for Active Learning from Equilibrium
AU - Abraham, Ian
AU - Prabhakar, Ahalya
AU - Murphey, Todd D.
N1 - Funding Information:
Manuscript received June 30, 2019; revised May 29, 2020; accepted November 17, 2020. Date of publication January 5, 2021; date of current version July 2, 2021. This article was recommended for publication by Editor L. Tapia upon evaluation of the reviewers’ comments. This work was supported in part by the National Science Foundation under Grant CNS 1837515. (Corresponding author: Ian Abraham.) The authors are with the Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208 USA (e-mail: i-abr@u.northwestern.edu.) Color versions of one or more figures in this article are available at https://doi.org/10.1109/TASE.2020.3043636. Digital Object Identifier 10.1109/TASE.2020.3043636
Publisher Copyright:
© 2004-2012 IEEE.
PY - 2021/7
Y1 - 2021/7
N2 - This article develops KL-ergodic exploration from equilibrium (KL-E3), a method for robotic systems to integrate stability into actively generating informative measurements through ergodic exploration. Ergodic exploration enables robotic systems to indirectly sample from informative spatial distributions globally, avoiding local optima, and without the need to evaluate the derivatives of the distribution against the robot dynamics. Using a hybrid systems theory, we derive a controller that allows a robot to exploit equilibrium policies (i.e., policies that solve a task) while allowing the robot to explore and generate informative data using an ergodic measure that can extend to high-dimensional states. We show that our method is able to maintain Lyapunov attractiveness with respect to the equilibrium task while actively generating data for learning tasks such, as Bayesian optimization, model learning, and off-policy reinforcement learning. In each example, we show that our proposed method is capable of generating an informative distribution of data while synthesizing smooth control signals. We illustrate these examples using simulated systems and provide simplification of our method for real-time online learning in robotic systems. Note to Practitioners - Robotic systems need to adapt to sensor measurements and learn to exploit an understanding of the world around them such that they can truly begin to experiment in the real world. Standard learning methods do not have any restrictions on how the robot can explore and learn, making the robot dynamically volatile. Those that do are often too restrictive in terms of the stability of the robot, resulting in a lack of improved learning due to poor data collection. Applying our method would allow robotic systems to be able to adapt online without the need for human intervention. We show that considering both the dynamics of the robot and the statistics of where the robot has been, we are able to naturally encode where the robot needs to explore and collect measurements for efficient learning that is dynamically safe. With our method, we are able to effectively learn while being energetically efficient compared with state-of-the-art active learning methods. Our approach accomplishes such tasks in a single execution of the robotic system, i.e., the robot does not need human intervention to reset it. Future work will consider multiagent robotic systems that actively learn and explore in a team of collaborative robots.
AB - This article develops KL-ergodic exploration from equilibrium (KL-E3), a method for robotic systems to integrate stability into actively generating informative measurements through ergodic exploration. Ergodic exploration enables robotic systems to indirectly sample from informative spatial distributions globally, avoiding local optima, and without the need to evaluate the derivatives of the distribution against the robot dynamics. Using a hybrid systems theory, we derive a controller that allows a robot to exploit equilibrium policies (i.e., policies that solve a task) while allowing the robot to explore and generate informative data using an ergodic measure that can extend to high-dimensional states. We show that our method is able to maintain Lyapunov attractiveness with respect to the equilibrium task while actively generating data for learning tasks such, as Bayesian optimization, model learning, and off-policy reinforcement learning. In each example, we show that our proposed method is capable of generating an informative distribution of data while synthesizing smooth control signals. We illustrate these examples using simulated systems and provide simplification of our method for real-time online learning in robotic systems. Note to Practitioners - Robotic systems need to adapt to sensor measurements and learn to exploit an understanding of the world around them such that they can truly begin to experiment in the real world. Standard learning methods do not have any restrictions on how the robot can explore and learn, making the robot dynamically volatile. Those that do are often too restrictive in terms of the stability of the robot, resulting in a lack of improved learning due to poor data collection. Applying our method would allow robotic systems to be able to adapt online without the need for human intervention. We show that considering both the dynamics of the robot and the statistics of where the robot has been, we are able to naturally encode where the robot needs to explore and collect measurements for efficient learning that is dynamically safe. With our method, we are able to effectively learn while being energetically efficient compared with state-of-the-art active learning methods. Our approach accomplishes such tasks in a single execution of the robotic system, i.e., the robot does not need human intervention to reset it. Future work will consider multiagent robotic systems that actively learn and explore in a team of collaborative robots.
KW - Active exploration
KW - active learning
KW - online learning
KW - stable learning
UR - http://www.scopus.com/inward/record.url?scp=85099234716&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099234716&partnerID=8YFLogxK
U2 - 10.1109/TASE.2020.3043636
DO - 10.1109/TASE.2020.3043636
M3 - Article
AN - SCOPUS:85099234716
SN - 1545-5955
VL - 18
SP - 917
EP - 931
JO - IEEE Transactions on Automation Science and Engineering
JF - IEEE Transactions on Automation Science and Engineering
IS - 3
M1 - 9312988
ER -