TY - JOUR
T1 - Language as a biomarker for psychosis
T2 - A natural language processing approach
AU - Corcoran, Cheryl M.
AU - Mittal, Vijay A.
AU - Bearden, Carrie E.
AU - E. Gur, Raquel
AU - Hitczenko, Kasia
AU - Bilgrami, Zarina
AU - Savic, Aleksandar
AU - Cecchi, Guillermo A.
AU - Wolff, Phillip
N1 - Funding Information:
We would like to acknowledge to grants, both to the first author Cheryl Corcoran: National Institutes of Health 5R01MH107558-06 and 5R01MH115332-03
Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2020/12
Y1 - 2020/12
N2 - Human ratings of conceptual disorganization, poverty of content, referential cohesion and illogical thinking have been shown to predict psychosis onset in prospective clinical high risk (CHR) cohort studies. The potential value of linguistic biomarkers has been significantly magnified, however, by recent advances in natural language processing (NLP) and machine learning (ML). Such methodologies allow for the rapid and objective measurement of language features, many of which are not easily recognized by human raters. Here we review the key findings on language production disturbance in psychosis. We also describe recent advances in the computational methods used to analyze language data, including methods for the automatic measurement of discourse coherence, syntactic complexity, poverty of content, referential coherence, and metaphorical language. Linguistic biomarkers of psychosis risk are now undergoing cross-validation, with attention to harmonization of methods. Future directions in extended CHR networks include studies of sources of variance, and combination with other promising biomarkers of psychosis risk, such as cognitive and sensory processing impairments likely to be related to language. Implications for the broader study of social communication, including reciprocal prosody, face expression and gesture, are discussed.
AB - Human ratings of conceptual disorganization, poverty of content, referential cohesion and illogical thinking have been shown to predict psychosis onset in prospective clinical high risk (CHR) cohort studies. The potential value of linguistic biomarkers has been significantly magnified, however, by recent advances in natural language processing (NLP) and machine learning (ML). Such methodologies allow for the rapid and objective measurement of language features, many of which are not easily recognized by human raters. Here we review the key findings on language production disturbance in psychosis. We also describe recent advances in the computational methods used to analyze language data, including methods for the automatic measurement of discourse coherence, syntactic complexity, poverty of content, referential coherence, and metaphorical language. Linguistic biomarkers of psychosis risk are now undergoing cross-validation, with attention to harmonization of methods. Future directions in extended CHR networks include studies of sources of variance, and combination with other promising biomarkers of psychosis risk, such as cognitive and sensory processing impairments likely to be related to language. Implications for the broader study of social communication, including reciprocal prosody, face expression and gesture, are discussed.
KW - Automated language analysis
KW - Clinical high risk
KW - Digital phenotyping
KW - Discourse coherence
KW - Latent semantic analysis
KW - Machine learning
KW - Natural language processing
KW - Psychosis
KW - Psychosis risk
KW - Referential coherence
KW - Schizophrenia
KW - Semantic coherence
KW - Semantic density
KW - Ultra high risk
UR - http://www.scopus.com/inward/record.url?scp=85085598667&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085598667&partnerID=8YFLogxK
U2 - 10.1016/j.schres.2020.04.032
DO - 10.1016/j.schres.2020.04.032
M3 - Article
C2 - 32499162
AN - SCOPUS:85085598667
SN - 0920-9964
VL - 226
SP - 158
EP - 166
JO - Schizophrenia Research
JF - Schizophrenia Research
ER -