Gender Stereotypes in Natural Language: Word Embeddings Show Robust Consistency Across Child and Adult Language Corpora of More Than 65 Million Words

Tessa E.S. Charlesworth*, Victor Yang, Thomas C. Mann, Benedek Kurdi, Mahzarin R. Banaji

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

63 Scopus citations

Abstract

Stereotypes are associations between social groups and semantic attributes that are widely shared within societies. The spoken and written language of a society affords a unique way to measure the magnitude and prevalence of these widely shared collective representations. Here, we used word embeddings to systematically quantify gender stereotypes in language corpora that are unprecedented in size (65+ million words) and scope (child and adult conversations, books, movies, TV). Across corpora, gender stereotypes emerged consistently and robustly for both theoretically selected stereotypes (e.g., work–home) and comprehensive lists of more than 600 personality traits and more than 300 occupations. Despite underlying differences across language corpora (e.g., time periods, formats, age groups), results revealed the pervasiveness of gender stereotypes in every corpus. Using gender stereotypes as the focal issue, we unite 19th-century theories of collective representations and 21st-century evidence on implicit social cognition to understand the subtle yet persistent presence of collective representations in language.

Original languageEnglish (US)
Pages (from-to)218-240
Number of pages23
JournalPsychological Science
Volume32
Issue number2
DOIs
StatePublished - Feb 2021

Keywords

  • collective representations
  • gender stereotypes
  • machine learning
  • natural-language processing
  • open data
  • open materials
  • word embeddings

ASJC Scopus subject areas

  • General Psychology

Fingerprint

Dive into the research topics of 'Gender Stereotypes in Natural Language: Word Embeddings Show Robust Consistency Across Child and Adult Language Corpora of More Than 65 Million Words'. Together they form a unique fingerprint.

Cite this