TY - GEN
T1 - Learning robot motion control with demonstration and advice-operators
AU - Argall, Brenna D.
AU - Browning, Brett
AU - Veloso, Manuela
PY - 2008
Y1 - 2008
N2 - As robots become more commonplace within society, the need for tools to enable non-robotics-experts to develop control algorithms, or policies, will increase. Learning from Demonstration (LfD) offers one promising approach, where the robot learns a policy from teacher task executions. Our interests lie with robot motion control policies which map world observations to continuous low-level actions. In this work, we introduce Advice-Operator Policy Improvement (A-OPI) as a novel approach for improving policies within LfD. Two distinguishing characteristics of the A-OPI algorithm are data source and continuous state-action space. Within LfD, more example data can improve a policy. In A-OPI, new data is synthesized from a student execution and teacher advice. By contrast, typical demonstration approaches provide the learner with exclusively teacher executions. A-OPI is effective within continuous state-action spaces because high level human advice is translated into continuous-valued corrections on the student execution. This work presents a first implementation of the AOPI algorithm, validated on a Segway RMP robot performing a spatial positioning task. A-OPI is found to improve task performance, both in success and accuracy. Furthermore, performance is shown to be similar or superior to the typical exclusively teacher demonstrations approach.
AB - As robots become more commonplace within society, the need for tools to enable non-robotics-experts to develop control algorithms, or policies, will increase. Learning from Demonstration (LfD) offers one promising approach, where the robot learns a policy from teacher task executions. Our interests lie with robot motion control policies which map world observations to continuous low-level actions. In this work, we introduce Advice-Operator Policy Improvement (A-OPI) as a novel approach for improving policies within LfD. Two distinguishing characteristics of the A-OPI algorithm are data source and continuous state-action space. Within LfD, more example data can improve a policy. In A-OPI, new data is synthesized from a student execution and teacher advice. By contrast, typical demonstration approaches provide the learner with exclusively teacher executions. A-OPI is effective within continuous state-action spaces because high level human advice is translated into continuous-valued corrections on the student execution. This work presents a first implementation of the AOPI algorithm, validated on a Segway RMP robot performing a spatial positioning task. A-OPI is found to improve task performance, both in success and accuracy. Furthermore, performance is shown to be similar or superior to the typical exclusively teacher demonstrations approach.
UR - http://www.scopus.com/inward/record.url?scp=69549135371&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=69549135371&partnerID=8YFLogxK
U2 - 10.1109/IROS.2008.4651020
DO - 10.1109/IROS.2008.4651020
M3 - Conference contribution
AN - SCOPUS:69549135371
SN - 9781424420582
T3 - 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
SP - 399
EP - 404
BT - 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
T2 - 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
Y2 - 22 September 2008 through 26 September 2008
ER -