Artificial intelligence and dichotomania

Blakeley B. McShane, David Gal*, Adam Duhachek

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Large language models (LLMs) such as ChatGPT, Gemini, and Claude are increasingly being used in aid or place of human judgment and decision making. Indeed, academic researchers are increasingly using LLMs as a research tool. In this paper, we examine whether LLMs, like academic researchers, fall prey to a particularly common human error in interpreting statistical results, namely ‘dichotomania’ that results from the dichotomization of statistical results into the categories ‘statistically significant’ and ‘statistically nonsignificant’. We find that ChatGPT, Gemini, and Claude fall prey to dichotomania at the 0.05 and 0.10 thresholds commonly used to declare ‘statistical significance’. In addition, prompt engineering with principles taken from an American Statistical Association Statement on Statistical Significance and P-values intended as a corrective to human errors does not mitigate this and arguably exacerbates it. Further, more recent and larger versions of these models do not necessarily perform better. Finally, these models sometimes provide interpretations that are not only incorrect but also highly erratic.

Original languageEnglish (US)
Article numbere23
JournalJudgment and Decision Making
Volume20
DOIs
StatePublished - 2025

Keywords

  • P-value
  • artificial intelligence
  • large language models
  • null hypothesis significance testing
  • sociology of science
  • statistical significance

ASJC Scopus subject areas

  • General Decision Sciences
  • Applied Psychology
  • Economics and Econometrics

Fingerprint

Dive into the research topics of 'Artificial intelligence and dichotomania'. Together they form a unique fingerprint.

Cite this