Identification of long bone fractures in radiology reports using natural language processing to support healthcare quality improvement

Robert W. Grundmeier*, Aaron J. Masino, T. Charles Casper, Jonathan M. Dean, Jamie Bell, Rene Enriquez, Sara Deakyne, James M. Chamberlain, Elizabeth R. Alpern

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

28 Scopus citations


Background: Important information to support healthcare quality improvement is often recorded in free text documents such as radiology reports. Natural language processing (NLP) methods may help extract this information, but these methods have rarely been applied outside the research laboratories where they were developed. Objective: To implement and validate NLP tools to identify long bone fractures for pediatric emergency medicine quality improvement. Methods: Using freely available statistical software packages, we implemented NLP methods to identify long bone fractures from radiology reports. A sample of 1,000 radiology reports was used to construct three candidate classification models. A test set of 500 reports was used to validate the model performance. Blinded manual review of radiology reports by two independent physicians provided the reference standard. Each radiology report was segmented and word stem and bigram features were constructed. Common English “stop words” and rare features were excluded. We used 10-fold cross-validation to select optimal configuration parameters for each model. Accuracy, recall, precision and the F1 score were calculated. The final model was compared to the use of diagnosis codes for the identification of patients with long bone fractures. Results: There were 329 unique word stems and 344 bigrams in the training documents. A support vector machine classifier with Gaussian kernel performed best on the test set with accuracy=0.958, recall=0.969, precision=0.940, and F1 score=0.954. Optimal parameters for this model were cost=4 and gamma=0.005. The three classification models that we tested all performed better than diagnosis codes in terms of accuracy, precision, and F1 score (diagnosis code accuracy=0.932, recall= 0.960, precision=0.896, and F1 score=0.927). Conclusions: NLP methods using a corpus of 1,000 training documents accurately identified acute long bone fractures from radiology reports. Strategic use of straightforward NLP methods, implemented with freely available software, offers quality improvement teams new opportunities to extract information from narrative documents.

Original languageEnglish (US)
Pages (from-to)1051-1068
Number of pages18
JournalApplied Clinical Informatics
Issue number4
StatePublished - 2016


  • Emergency medicine
  • Machine learning
  • Natural language processing
  • Pediatrics
  • Quality improvement

ASJC Scopus subject areas

  • Health Informatics
  • Computer Science Applications
  • Health Information Management


Dive into the research topics of 'Identification of long bone fractures in radiology reports using natural language processing to support healthcare quality improvement'. Together they form a unique fingerprint.

Cite this