Abstract
Introduction: Trauma injury severity scores are currently calculated retrospectively from the electronic health record (EHR) using manual annotation by certified trauma coders. Natural language processing (NLP) of clinical documents in the EHR may enable automated injury scoring. We hypothesize that NLP with machine learning can discriminate between cases of severe and non-severe injury to the thorax after trauma. Methods: Clinical documents from a trauma center were examined between 2014 and 2018. Severe chest injury was defined as a thorax abbreviated injury score (AIS) >2 and served as the reference standard for supervised learning. Free text unigrams and concept unique identifiers (CUIs) from the Unified Medical Language Systems (UMLS) were extracted from clinical documents collected at one hour, four hours, and eight hours after patient arrival to the emergency department. Logistic regression models with elastic net regularization were tuned to maximize area under the receiver operating characteristic curve (AUROC) using 10-fold cross-validation on the training dataset (80%) and tested on a hold-out 20% dataset. Results: There were 6,891 traumas that met inclusion criteria. The complete data corpus consisted of 473,694 documents. Models trained using the first hour of data had a mean AUROC of 0.88 (95%CI [0.86, 0.89]); model discrimination and reclassification from the first hour significantly improved after eight hours with a mean AUROC of 0.94 (95%CI [0.93, 0.95]). Performance of models using CUIs were similar to unigrams (p>0.05). Models demonstrated excellent clinical face validity. Conclusions: Both CUIs and unigrams demonstrated excellent discrimination in predicting severity of chest injury using the first eight hours of clinical documents. Our model demonstrates that automated anatomical injury scoring is feasible and may be used for aggregation of data for trauma research and quality programs.
Original language | English (US) |
---|---|
Pages (from-to) | 205-212 |
Number of pages | 8 |
Journal | Injury |
Volume | 52 |
Issue number | 2 |
DOIs | |
State | Published - Feb 2021 |
Funding
Drs. Afshar, Churpek, Dligach, and Kulshrestha received support for article research from the National Institutes of Health (NIH). Dr. Churpek received funding from an R01 from National Institute of General Medical Sciences (NIGMS) R01 GM123193, research support from EarlySense (Tel Aviv, Israel), and he has a patent pending (ARCD. P0535US.P2) for risk stratification algorithms for hospitalized patients. Dr. Dligach is supported by the National Library Of Medicine of the National Institutes of Health under Award Number R01LM012973 and R01LM010090 from the National Library Of Medicine. Dr. Kulshrestha is supported by National Institutes of Health T32 NIAAA 5T32AA013527–17. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The remaining authors have disclosed that they do not have any potential conflicts of interest.
Keywords
- Machine learning
- Natural language processing
- Trauma
- Trauma registry
ASJC Scopus subject areas
- Emergency Medicine
- Orthopedics and Sports Medicine