Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing

Sheng Yu*, Kanako K. Kumamaru, Elizabeth George, Ruth M. Dunne, Arash Bedayat, Matey Neykov, Andetta R. Hunsaker, Karin E. Dill, Tianxi Cai, Frank J. Rybicki

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

26 Scopus citations

Abstract

In this paper we describe an efficient tool based on natural language processing for classifying the detail state of pulmonary embolism (PE) recorded in CT pulmonary angiography reports. The classification tasks include: PE present vs. absent, acute PE vs. others, central PE vs. others, and subsegmental PE vs. others. Statistical learning algorithms were trained with features extracted using the NLP tool and gold standard labels obtained via chart review from two radiologists. The areas under the receiver operating characteristic curves (AUC) for the four tasks were 0.998, 0.945, 0.987, and 0.986, respectively. We compared our classifiers with bag-of-words Naive Bayes classifiers, a standard text mining technology, which gave AUC 0.942, 0.765, 0.766, and 0.712, respectively.

Original languageEnglish (US)
Pages (from-to)386-393
Number of pages8
JournalJournal of Biomedical Informatics
Volume52
DOIs
StatePublished - Dec 1 2014

Keywords

  • CT pulmonary angiography
  • NILE
  • Natural language processing
  • Nested modification structure
  • Pulmonary embolism

ASJC Scopus subject areas

  • Health Informatics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing'. Together they form a unique fingerprint.

Cite this