Language as a biomarker for psychosis: A natural language processing approach

Cheryl M. Corcoran, Vijay A. Mittal, Carrie E. Bearden, Raquel E. Gur, Kasia Hitczenko, Zarina Bilgrami, Aleksandar Savic, Guillermo A. Cecchi, Phillip Wolff*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

39 Scopus citations


Human ratings of conceptual disorganization, poverty of content, referential cohesion and illogical thinking have been shown to predict psychosis onset in prospective clinical high risk (CHR) cohort studies. The potential value of linguistic biomarkers has been significantly magnified, however, by recent advances in natural language processing (NLP) and machine learning (ML). Such methodologies allow for the rapid and objective measurement of language features, many of which are not easily recognized by human raters. Here we review the key findings on language production disturbance in psychosis. We also describe recent advances in the computational methods used to analyze language data, including methods for the automatic measurement of discourse coherence, syntactic complexity, poverty of content, referential coherence, and metaphorical language. Linguistic biomarkers of psychosis risk are now undergoing cross-validation, with attention to harmonization of methods. Future directions in extended CHR networks include studies of sources of variance, and combination with other promising biomarkers of psychosis risk, such as cognitive and sensory processing impairments likely to be related to language. Implications for the broader study of social communication, including reciprocal prosody, face expression and gesture, are discussed.

Original languageEnglish (US)
Pages (from-to)158-166
Number of pages9
JournalSchizophrenia Research
StatePublished - Dec 2020


  • Automated language analysis
  • Clinical high risk
  • Digital phenotyping
  • Discourse coherence
  • Latent semantic analysis
  • Machine learning
  • Natural language processing
  • Psychosis
  • Psychosis risk
  • Referential coherence
  • Schizophrenia
  • Semantic coherence
  • Semantic density
  • Ultra high risk

ASJC Scopus subject areas

  • Psychiatry and Mental health
  • Biological Psychiatry


Dive into the research topics of 'Language as a biomarker for psychosis: A natural language processing approach'. Together they form a unique fingerprint.

Cite this