TY - JOUR
T1 - Development and validation of MicrobEx
T2 - an open-source package for microbiology culture concept extraction
AU - Eickelberg, Garrett
AU - Luo, Yuan
AU - Sanchez-Pinto, L. Nelson
N1 - Publisher Copyright:
© 2022 The Author(s). Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2022/7/1
Y1 - 2022/7/1
N2 - Objective: Microbiology culture reports contain critical information for important clinical and public health applications. However, microbiology reports often have complex, semistructured, free-text data that present a barrier for secondary use. Here we present the development and validation of an open-source package designed to ingest free-text microbiology reports, determine whether the culture is positive, and return a list of Systemized Nomenclature of Medicine (SNOMED)-CT mapped bacteria. Materials and Methods: Our concept extraction Python package, MicrobEx, is built upon a rule-based natural language processing algorithm and was developed using microbiology reports from 2 different electronic health record systems in a large healthcare organization, and then externally validated on the reports of 2 other institutions with manually reviewed results as a benchmark. Results: MicrobEx achieved F1 scores >0.95 on all classification tasks across 2 independent validation sets with minimal customization. Additionally, MicrobEx matched or surpassed our MetaMap-based benchmark algorithm performance across positive culture classification and species capture classification tasks. Discussion: Our results suggest that MicrobEx can be used to reliably estimate binary bacterial culture status, extract bacterial species, and map these to SNOMED organism observations when applied to semistructured, free-text microbiology reports from different institutions with relatively low customization. Conclusion: MicrobEx offers an open-source software solution (available on both GitHub and PyPI) for bacterial culture status estimation and bacterial species extraction from free-text microbiology reports. The package was designed to be reused and adapted to individual institutions as an upstream process for other clinical applications such as: machine learning, clinical decision support, and disease surveillance systems.
AB - Objective: Microbiology culture reports contain critical information for important clinical and public health applications. However, microbiology reports often have complex, semistructured, free-text data that present a barrier for secondary use. Here we present the development and validation of an open-source package designed to ingest free-text microbiology reports, determine whether the culture is positive, and return a list of Systemized Nomenclature of Medicine (SNOMED)-CT mapped bacteria. Materials and Methods: Our concept extraction Python package, MicrobEx, is built upon a rule-based natural language processing algorithm and was developed using microbiology reports from 2 different electronic health record systems in a large healthcare organization, and then externally validated on the reports of 2 other institutions with manually reviewed results as a benchmark. Results: MicrobEx achieved F1 scores >0.95 on all classification tasks across 2 independent validation sets with minimal customization. Additionally, MicrobEx matched or surpassed our MetaMap-based benchmark algorithm performance across positive culture classification and species capture classification tasks. Discussion: Our results suggest that MicrobEx can be used to reliably estimate binary bacterial culture status, extract bacterial species, and map these to SNOMED organism observations when applied to semistructured, free-text microbiology reports from different institutions with relatively low customization. Conclusion: MicrobEx offers an open-source software solution (available on both GitHub and PyPI) for bacterial culture status estimation and bacterial species extraction from free-text microbiology reports. The package was designed to be reused and adapted to individual institutions as an upstream process for other clinical applications such as: machine learning, clinical decision support, and disease surveillance systems.
KW - concept extraction
KW - electronic health records
KW - information extraction
KW - microbiology report
KW - natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85134966657&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134966657&partnerID=8YFLogxK
U2 - 10.1093/jamiaopen/ooac026
DO - 10.1093/jamiaopen/ooac026
M3 - Article
C2 - 35651524
AN - SCOPUS:85134966657
SN - 2574-2531
VL - 5
JO - JAMIA Open
JF - JAMIA Open
IS - 2
M1 - ooac026
ER -