Historical representations of social groups across 200 years of word embeddings from Google Books

Tessa E.S. Charlesworth, Aylin Caliskan, Mahzarin R. Banaji*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

24 Scopus citations


Using word embeddings from 850 billion words in English-language Google Books, we provide an extensive analysis of historical change and stability in social group representations (stereotypes) across a long timeframe (from 1800 to 1999), for a large number of social group targets (Black, White, Asian, Irish, Hispanic, Native American, Man, Woman, Old, Young, Fat, Thin, Rich, Poor), and their emergent, bottom-up associations with 14,000 words and a subset of 600 traits. The results provide a nuanced picture of change and persistence in stereotypes across 200 y. Change was observed in the top-associated words and traits: Whether analyzing the top 10 or 50 associates, at least 50% of top associates changed across successive decades. Despite this changing content of top-associated words, the average valence (positivity/negativity) of these top stereotypes was generally persistent. Ultimately, through advances in the availability of historical word embeddings, this study offers a comprehensive characterization of both change and persistence in social group representations as revealed through books of the English-speaking world from 1800 to 1999.

Original languageEnglish (US)
Article numbere2121798119
JournalProceedings of the National Academy of Sciences of the United States of America
Issue number28
StatePublished - Jul 12 2022


  • attitude change
  • natural language processing
  • stereotype change
  • word embeddings

ASJC Scopus subject areas

  • General


Dive into the research topics of 'Historical representations of social groups across 200 years of word embeddings from Google Books'. Together they form a unique fingerprint.

Cite this