TY - GEN
T1 - Policy teaching through reward function learning
AU - Zhang, Haoqi
AU - Parkes, David C.
AU - Chen, Yiling
PY - 2009
Y1 - 2009
N2 - Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
AB - Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.
KW - Active indirect elicitation
KW - Environment design
KW - Policy teaching
KW - Preference elicitation
KW - Preference learning
UR - http://www.scopus.com/inward/record.url?scp=77950582721&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77950582721&partnerID=8YFLogxK
U2 - 10.1145/1566374.1566417
DO - 10.1145/1566374.1566417
M3 - Conference contribution
AN - SCOPUS:77950582721
SN - 9781605584584
T3 - Proceedings of the ACM Conference on Electronic Commerce
SP - 295
EP - 304
BT - EC'09 - Proceedings of the 2009 ACM Conference on Electronic Commerce
T2 - 2009 ACM Conference on Electronic Commerce, EC'09
Y2 - 6 July 2009 through 10 July 2009
ER -