TY - JOUR
T1 - Artificial intelligence and dichotomania
AU - McShane, Blakeley B.
AU - Gal, David
AU - Duhachek, Adam
N1 - Publisher Copyright:
© The Author(s), 2025. Published by Cambridge University Press on behalf of Society for Judgment and Decision Making and European Association for Decision Making.
PY - 2025
Y1 - 2025
N2 - Large language models (LLMs) such as ChatGPT, Gemini, and Claude are increasingly being used in aid or place of human judgment and decision making. Indeed, academic researchers are increasingly using LLMs as a research tool. In this paper, we examine whether LLMs, like academic researchers, fall prey to a particularly common human error in interpreting statistical results, namely ‘dichotomania’ that results from the dichotomization of statistical results into the categories ‘statistically significant’ and ‘statistically nonsignificant’. We find that ChatGPT, Gemini, and Claude fall prey to dichotomania at the 0.05 and 0.10 thresholds commonly used to declare ‘statistical significance’. In addition, prompt engineering with principles taken from an American Statistical Association Statement on Statistical Significance and P-values intended as a corrective to human errors does not mitigate this and arguably exacerbates it. Further, more recent and larger versions of these models do not necessarily perform better. Finally, these models sometimes provide interpretations that are not only incorrect but also highly erratic.
AB - Large language models (LLMs) such as ChatGPT, Gemini, and Claude are increasingly being used in aid or place of human judgment and decision making. Indeed, academic researchers are increasingly using LLMs as a research tool. In this paper, we examine whether LLMs, like academic researchers, fall prey to a particularly common human error in interpreting statistical results, namely ‘dichotomania’ that results from the dichotomization of statistical results into the categories ‘statistically significant’ and ‘statistically nonsignificant’. We find that ChatGPT, Gemini, and Claude fall prey to dichotomania at the 0.05 and 0.10 thresholds commonly used to declare ‘statistical significance’. In addition, prompt engineering with principles taken from an American Statistical Association Statement on Statistical Significance and P-values intended as a corrective to human errors does not mitigate this and arguably exacerbates it. Further, more recent and larger versions of these models do not necessarily perform better. Finally, these models sometimes provide interpretations that are not only incorrect but also highly erratic.
KW - P-value
KW - artificial intelligence
KW - large language models
KW - null hypothesis significance testing
KW - sociology of science
KW - statistical significance
UR - http://www.scopus.com/inward/record.url?scp=105004319356&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105004319356&partnerID=8YFLogxK
U2 - 10.1017/jdm.2025.7
DO - 10.1017/jdm.2025.7
M3 - Article
AN - SCOPUS:105004319356
SN - 1930-2975
VL - 20
JO - Judgment and Decision Making
JF - Judgment and Decision Making
M1 - e23
ER -