Predictive power of word surprisal for reading times is a linear function of language model quality

Adam Goodkind, Klinton Bicknell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Within human sentence processing, it is known that there are large effects of a word’s probability in context on how long it takes to read it. This relationship has been quantified using information theoretic surprise, or the amount of new information conveyed by a word. Here, we compare surprises derived from a collection of language models derived from n-grams, neural networks, and a combination of both. We show that the models’ psychological predictive power improves as a tight linear function of language model linguistic quality. We also show that the size of the effect of surprisal is estimated consistently across all types of language models. These findings point toward surprising robustness of surprisal estimates and suggest that surprisal estimated by low-quality language models are not biased.
Original languageEnglish (US)
Title of host publicationProceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018)
EditorsAsad Sayeed, Cassandra Jacobs, Tal Linzen, Marten van Schijndel
PublisherAssociation for Computational Linguistics (ACL)
Pages10-18
Number of pages9
DOIs
StatePublished - 2018

Fingerprint

language
neural network
time
linguistics

Cite this

Goodkind, A., & Bicknell, K. (2018). Predictive power of word surprisal for reading times is a linear function of language model quality. In A. Sayeed, C. Jacobs, T. Linzen, & M. van Schijndel (Eds.), Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018) (pp. 10-18). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/W18-0102
Goodkind, Adam ; Bicknell, Klinton. / Predictive power of word surprisal for reading times is a linear function of language model quality. Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018). editor / Asad Sayeed ; Cassandra Jacobs ; Tal Linzen ; Marten van Schijndel. Association for Computational Linguistics (ACL), 2018. pp. 10-18
@inproceedings{c906a7ea6c934b3ea54f67a4ee023484,
title = "Predictive power of word surprisal for reading times is a linear function of language model quality",
abstract = "Within human sentence processing, it is known that there are large effects of a word’s probability in context on how long it takes to read it. This relationship has been quantified using information theoretic surprise, or the amount of new information conveyed by a word. Here, we compare surprises derived from a collection of language models derived from n-grams, neural networks, and a combination of both. We show that the models’ psychological predictive power improves as a tight linear function of language model linguistic quality. We also show that the size of the effect of surprisal is estimated consistently across all types of language models. These findings point toward surprising robustness of surprisal estimates and suggest that surprisal estimated by low-quality language models are not biased.",
author = "Adam Goodkind and Klinton Bicknell",
year = "2018",
doi = "10.18653/v1/W18-0102",
language = "English (US)",
pages = "10--18",
editor = "Asad Sayeed and Cassandra Jacobs and Tal Linzen and {van Schijndel}, Marten",
booktitle = "Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018)",
publisher = "Association for Computational Linguistics (ACL)",

}

Goodkind, A & Bicknell, K 2018, Predictive power of word surprisal for reading times is a linear function of language model quality. in A Sayeed, C Jacobs, T Linzen & M van Schijndel (eds), Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018). Association for Computational Linguistics (ACL), pp. 10-18. https://doi.org/10.18653/v1/W18-0102

Predictive power of word surprisal for reading times is a linear function of language model quality. / Goodkind, Adam; Bicknell, Klinton.

Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018). ed. / Asad Sayeed; Cassandra Jacobs; Tal Linzen; Marten van Schijndel. Association for Computational Linguistics (ACL), 2018. p. 10-18.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Predictive power of word surprisal for reading times is a linear function of language model quality

AU - Goodkind, Adam

AU - Bicknell, Klinton

PY - 2018

Y1 - 2018

N2 - Within human sentence processing, it is known that there are large effects of a word’s probability in context on how long it takes to read it. This relationship has been quantified using information theoretic surprise, or the amount of new information conveyed by a word. Here, we compare surprises derived from a collection of language models derived from n-grams, neural networks, and a combination of both. We show that the models’ psychological predictive power improves as a tight linear function of language model linguistic quality. We also show that the size of the effect of surprisal is estimated consistently across all types of language models. These findings point toward surprising robustness of surprisal estimates and suggest that surprisal estimated by low-quality language models are not biased.

AB - Within human sentence processing, it is known that there are large effects of a word’s probability in context on how long it takes to read it. This relationship has been quantified using information theoretic surprise, or the amount of new information conveyed by a word. Here, we compare surprises derived from a collection of language models derived from n-grams, neural networks, and a combination of both. We show that the models’ psychological predictive power improves as a tight linear function of language model linguistic quality. We also show that the size of the effect of surprisal is estimated consistently across all types of language models. These findings point toward surprising robustness of surprisal estimates and suggest that surprisal estimated by low-quality language models are not biased.

U2 - 10.18653/v1/W18-0102

DO - 10.18653/v1/W18-0102

M3 - Conference contribution

SP - 10

EP - 18

BT - Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018)

A2 - Sayeed, Asad

A2 - Jacobs, Cassandra

A2 - Linzen, Tal

A2 - van Schijndel, Marten

PB - Association for Computational Linguistics (ACL)

ER -

Goodkind A, Bicknell K. Predictive power of word surprisal for reading times is a linear function of language model quality. In Sayeed A, Jacobs C, Linzen T, van Schijndel M, editors, Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018). Association for Computational Linguistics (ACL). 2018. p. 10-18 https://doi.org/10.18653/v1/W18-0102