Within human sentence processing, it is known that there are large effects of a word’s probability in context on how long it takes to read it. This relationship has been quantified using information theoretic surprise, or the amount of new information conveyed by a word. Here, we compare surprises derived from a collection of language models derived from n-grams, neural networks, and a combination of both. We show that the models’ psychological predictive power improves as a tight linear function of language model linguistic quality. We also show that the size of the effect of surprisal is estimated consistently across all types of language models. These findings point toward surprising robustness of surprisal estimates and suggest that surprisal estimated by low-quality language models are not biased.
|Original language||English (US)|
|Title of host publication||Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018)|
|Editors||Asad Sayeed, Cassandra Jacobs, Tal Linzen, Marten van Schijndel|
|Publisher||Association for Computational Linguistics (ACL)|
|Number of pages||9|
|Publication status||Published - 2018|
Goodkind, A., & Bicknell, K. (2018). Predictive power of word surprisal for reading times is a linear function of language model quality. In A. Sayeed, C. Jacobs, T. Linzen, & M. van Schijndel (Eds.), Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018) (pp. 10-18). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/W18-0102