Predictive power of word surprisal for reading times is a linear function of language model quality

Adam Goodkind, Klinton Bicknell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Within human sentence processing, it is known that there are large effects of a word’s probability in context on how long it takes to read it. This relationship has been quantified using information theoretic surprise, or the amount of new information conveyed by a word. Here, we compare surprises derived from a collection of language models derived from n-grams, neural networks, and a combination of both. We show that the models’ psychological predictive power improves as a tight linear function of language model linguistic quality. We also show that the size of the effect of surprisal is estimated consistently across all types of language models. These findings point toward surprising robustness of surprisal estimates and suggest that surprisal estimated by low-quality language models are not biased.
Original languageEnglish (US)
Title of host publicationProceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018)
EditorsAsad Sayeed, Cassandra Jacobs, Tal Linzen, Marten van Schijndel
PublisherAssociation for Computational Linguistics (ACL)
Pages10-18
Number of pages9
DOIs
Publication statusPublished - 2018

    Fingerprint

Cite this

Goodkind, A., & Bicknell, K. (2018). Predictive power of word surprisal for reading times is a linear function of language model quality. In A. Sayeed, C. Jacobs, T. Linzen, & M. van Schijndel (Eds.), Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018) (pp. 10-18). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/W18-0102