Human detection of political speech deepfakes across transcripts, audio, and video

Matthew Groh*, Aruna Sankaranarayanan, Nikhil Singh, Dong Young Kim, Andrew Lippman, Rosalind Picard

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Recent advances in technology for hyper-realistic visual and audio effects provoke the concern that deepfake videos of political speeches will soon be indistinguishable from authentic video. We conduct 5 pre-registered randomized experiments with N = 2215 participants to evaluate how accurately humans distinguish real political speeches from fabrications across base rates of misinformation, audio sources, question framings with and without priming, and media modalities. We do not find base rates of misinformation have statistically significant effects on discernment. We find deepfakes with audio produced by the state-of-the-art text-to-speech algorithms are harder to discern than the same deepfakes with voice actor audio. Moreover across all experiments and question framings, we find audio and visual information enables more accurate discernment than text alone: human discernment relies more on how something is said, the audio-visual cues, than what is said, the speech content.

Original languageEnglish (US)
Article number7629
JournalNature communications
Volume15
Issue number1
DOIs
StatePublished - Dec 2024

Funding

The authors would like to acknowledge support for signing videos from Truepic and $7000 in funding for participant recruitment from Truepic for Experiments 2 through 5, funding from MIT Media Lab member companies, and Kellogg School of Management, thanks Colin Cassidy, J-L Cauvin, Austin Nasso for providing voice impressions for stimuli, thank the following users who contributed sounds for stimuli from Freesound.org including aaronstar, aleclubin, cmilan, funwithsound, jgarc, johnsonbrandediting, klankbeeld, macohibs, mzui, noisecollector, peridactyloptrix, speedygonzo, zabuhailo, thank David Rand, Gordon Pennycook, Rahul Bhui, Yunhao (Jerry) Zhang, Ziv Epstein, and members of the Affective Computing lab at the MIT Media Lab and the Human Cooperation lab at MIT Sloan School of Management for helpful feedback on early versions of this manuscript, Anna Murphy, Shreya Kalyan, Theo Chen, and Alicia Guo for research assistance, and Craig Ferguson for feedback on hosting the experiment.

ASJC Scopus subject areas

  • General Chemistry
  • General Biochemistry, Genetics and Molecular Biology
  • General Physics and Astronomy

Fingerprint

Dive into the research topics of 'Human detection of political speech deepfakes across transcripts, audio, and video'. Together they form a unique fingerprint.

Cite this