TY - JOUR
T1 - Identification of long bone fractures in radiology reports using natural language processing to support healthcare quality improvement
AU - Grundmeier, Robert W.
AU - Masino, Aaron J.
AU - Charles Casper, T.
AU - Dean, Jonathan M.
AU - Bell, Jamie
AU - Enriquez, Rene
AU - Deakyne, Sara
AU - Chamberlain, James M.
AU - Alpern, Elizabeth R.
N1 - Publisher Copyright:
© Schattauer 2016.
PY - 2016
Y1 - 2016
N2 - Background: Important information to support healthcare quality improvement is often recorded in free text documents such as radiology reports. Natural language processing (NLP) methods may help extract this information, but these methods have rarely been applied outside the research laboratories where they were developed. Objective: To implement and validate NLP tools to identify long bone fractures for pediatric emergency medicine quality improvement. Methods: Using freely available statistical software packages, we implemented NLP methods to identify long bone fractures from radiology reports. A sample of 1,000 radiology reports was used to construct three candidate classification models. A test set of 500 reports was used to validate the model performance. Blinded manual review of radiology reports by two independent physicians provided the reference standard. Each radiology report was segmented and word stem and bigram features were constructed. Common English “stop words” and rare features were excluded. We used 10-fold cross-validation to select optimal configuration parameters for each model. Accuracy, recall, precision and the F1 score were calculated. The final model was compared to the use of diagnosis codes for the identification of patients with long bone fractures. Results: There were 329 unique word stems and 344 bigrams in the training documents. A support vector machine classifier with Gaussian kernel performed best on the test set with accuracy=0.958, recall=0.969, precision=0.940, and F1 score=0.954. Optimal parameters for this model were cost=4 and gamma=0.005. The three classification models that we tested all performed better than diagnosis codes in terms of accuracy, precision, and F1 score (diagnosis code accuracy=0.932, recall= 0.960, precision=0.896, and F1 score=0.927). Conclusions: NLP methods using a corpus of 1,000 training documents accurately identified acute long bone fractures from radiology reports. Strategic use of straightforward NLP methods, implemented with freely available software, offers quality improvement teams new opportunities to extract information from narrative documents.
AB - Background: Important information to support healthcare quality improvement is often recorded in free text documents such as radiology reports. Natural language processing (NLP) methods may help extract this information, but these methods have rarely been applied outside the research laboratories where they were developed. Objective: To implement and validate NLP tools to identify long bone fractures for pediatric emergency medicine quality improvement. Methods: Using freely available statistical software packages, we implemented NLP methods to identify long bone fractures from radiology reports. A sample of 1,000 radiology reports was used to construct three candidate classification models. A test set of 500 reports was used to validate the model performance. Blinded manual review of radiology reports by two independent physicians provided the reference standard. Each radiology report was segmented and word stem and bigram features were constructed. Common English “stop words” and rare features were excluded. We used 10-fold cross-validation to select optimal configuration parameters for each model. Accuracy, recall, precision and the F1 score were calculated. The final model was compared to the use of diagnosis codes for the identification of patients with long bone fractures. Results: There were 329 unique word stems and 344 bigrams in the training documents. A support vector machine classifier with Gaussian kernel performed best on the test set with accuracy=0.958, recall=0.969, precision=0.940, and F1 score=0.954. Optimal parameters for this model were cost=4 and gamma=0.005. The three classification models that we tested all performed better than diagnosis codes in terms of accuracy, precision, and F1 score (diagnosis code accuracy=0.932, recall= 0.960, precision=0.896, and F1 score=0.927). Conclusions: NLP methods using a corpus of 1,000 training documents accurately identified acute long bone fractures from radiology reports. Strategic use of straightforward NLP methods, implemented with freely available software, offers quality improvement teams new opportunities to extract information from narrative documents.
KW - Emergency medicine
KW - Machine learning
KW - Natural language processing
KW - Pediatrics
KW - Quality improvement
UR - http://www.scopus.com/inward/record.url?scp=84995428307&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84995428307&partnerID=8YFLogxK
U2 - 10.4338/ACI-2016-08-RA-0129
DO - 10.4338/ACI-2016-08-RA-0129
M3 - Article
C2 - 27826610
AN - SCOPUS:84995428307
SN - 1869-0327
VL - 7
SP - 1051
EP - 1068
JO - Applied Clinical Informatics
JF - Applied Clinical Informatics
IS - 4
ER -