Mitigating Adversarial Norm Training with Moral Axioms

Taylor Olson, Kenneth D. Forbus

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

This paper addresses the issue of adversarial attacks on ethical AI systems. We investigate using moral axioms and rules of deontic logic in a norm learning framework to mitigate adversarial norm training. This model of moral intuition and construction provides AI systems with moral guard rails yet still allows for learning conventions. We evaluate our approach by drawing inspiration from a study commonly used in moral development research. This questionnaire aims to test an agent's ability to reason to moral conclusions despite opposed testimony. Our findings suggest that our model can still correctly evaluate moral situations and learn conventions in an adversarial training environment. We conclude that adding axiomatic moral prohibitions and deontic inference rules to a norm learning model makes it less vulnerable to adversarial attacks.

Original languageEnglish (US)
Title of host publicationAAAI-23 Technical Tracks 10
EditorsBrian Williams, Yiling Chen, Jennifer Neville
PublisherAAAI Press
Pages11882-11889
Number of pages8
ISBN (Electronic)9781577358800
DOIs
StatePublished - Jun 27 2023
Event37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, United States
Duration: Feb 7 2023Feb 14 2023

Publication series

NameProceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Volume37

Conference

Conference37th AAAI Conference on Artificial Intelligence, AAAI 2023
Country/TerritoryUnited States
CityWashington
Period2/7/232/14/23

Funding

This research was supported by grant FA9550-20-1-0091 from the Air Force Office of Scientific Research.

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Mitigating Adversarial Norm Training with Moral Axioms'. Together they form a unique fingerprint.

Cite this