Value-based policy teaching with active indirect elicitation

Haoqi Zhang*, David Parkes

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Scopus citations

Abstract

Many situations arise in which an interested party's utility is dependent on the actions of an agent; e.g., a teacher is interested in a student learning effectively and a firm is interested in a consumer's behavior. We consider an environment in which the interested party can provide incentives to affect the agent's actions but cannot otherwise enforce actions. In value-based policy teaching, we situate this within the framework of sequential decision tasks modeled by Markov Decision Processes, and seek to associate limited rewards with states that induce the agent to follow a policy that maximizes the total expected value of the interested party. We show value-based policy teaching is NP-hard and provide a mixed integer program formulation. Focusing in particular on environments in which the agent's reward is unknown to the interested party, we provide a method for active indirect elicitation wherein the agent's reward function is inferred from observations about its response to incentives. Experimental results suggest that we can generally find the optimal incentive provision in a small number of elicitation rounds.

Original languageEnglish (US)
Title of host publicationAAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference
Pages208-214
Number of pages7
StatePublished - Dec 24 2008
Event23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08 - Chicago, IL, United States
Duration: Jul 13 2008Jul 17 2008

Publication series

NameProceedings of the National Conference on Artificial Intelligence
Volume1

Other

Other23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08
CountryUnited States
CityChicago, IL
Period7/13/087/17/08

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Value-based policy teaching with active indirect elicitation'. Together they form a unique fingerprint.

  • Cite this

    Zhang, H., & Parkes, D. (2008). Value-based policy teaching with active indirect elicitation. In AAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference (pp. 208-214). (Proceedings of the National Conference on Artificial Intelligence; Vol. 1).