TY - GEN
T1 - Learning by demonstration with critique from a human teacher
AU - Argall, Brenna
AU - Browning, Brett
AU - Veloso, Manuela
PY - 2007
Y1 - 2007
N2 - Learning by demonstration can be a powerful and natural tool for developing robot control policies. That is, instead of tedious hand-coding, a robot may learn a control policy by interacting with a teacher. In this work we present an algorithm for learning by demonstration in which the teacher operates in two phases. The teacher first demonstrates the task to the learner. The teacher next critiques learner performance of the task. This critique is used by the learner to update its control policy. In our implementation we utilize a 1-Nearest Neighbor technique which incorporates both training dataset and teacher critique. Since the teacher critiques performance only, they do not need to guess at an effective critique for the underlying algorithm. We argue that this method is particularly well-suited to human teachers, who are generally better at assigning credit to performances than to algorithms. We have applied this algorithm to the simulated task of a robot intercepting a ball. Our results demonstrate improved performance with teacher critiquing, where performance is measured by both execution success and efficiency.
AB - Learning by demonstration can be a powerful and natural tool for developing robot control policies. That is, instead of tedious hand-coding, a robot may learn a control policy by interacting with a teacher. In this work we present an algorithm for learning by demonstration in which the teacher operates in two phases. The teacher first demonstrates the task to the learner. The teacher next critiques learner performance of the task. This critique is used by the learner to update its control policy. In our implementation we utilize a 1-Nearest Neighbor technique which incorporates both training dataset and teacher critique. Since the teacher critiques performance only, they do not need to guess at an effective critique for the underlying algorithm. We argue that this method is particularly well-suited to human teachers, who are generally better at assigning credit to performances than to algorithms. We have applied this algorithm to the simulated task of a robot intercepting a ball. Our results demonstrate improved performance with teacher critiquing, where performance is measured by both execution success and efficiency.
UR - http://www.scopus.com/inward/record.url?scp=34548290840&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34548290840&partnerID=8YFLogxK
U2 - 10.1145/1228716.1228725
DO - 10.1145/1228716.1228725
M3 - Conference contribution
AN - SCOPUS:34548290840
SN - 1595936173
SN - 9781595936172
T3 - HRI 2007 - Proceedings of the 2007 ACM/IEEE Conference on Human-Robot Interaction - Robot as Team Member
SP - 57
EP - 64
BT - HRI 2007 - Proceedings of the 2007 ACM/IEEE Conference on Human-Robot Interaction - Robot as Team Member
T2 - HRI 2007: 2007 ACM/IEEE Conference on Human-Robot Interaction - Robot as Team Member
Y2 - 8 March 2007 through 11 March 2007
ER -