Development and validation of MicrobEx: an open-source package for microbiology culture concept extraction

Garrett Eickelberg, Yuan Luo, L. Nelson Sanchez-Pinto*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Objective: Microbiology culture reports contain critical information for important clinical and public health applications. However, microbiology reports often have complex, semistructured, free-text data that present a barrier for secondary use. Here we present the development and validation of an open-source package designed to ingest free-text microbiology reports, determine whether the culture is positive, and return a list of Systemized Nomenclature of Medicine (SNOMED)-CT mapped bacteria. Materials and Methods: Our concept extraction Python package, MicrobEx, is built upon a rule-based natural language processing algorithm and was developed using microbiology reports from 2 different electronic health record systems in a large healthcare organization, and then externally validated on the reports of 2 other institutions with manually reviewed results as a benchmark. Results: MicrobEx achieved F1 scores >0.95 on all classification tasks across 2 independent validation sets with minimal customization. Additionally, MicrobEx matched or surpassed our MetaMap-based benchmark algorithm performance across positive culture classification and species capture classification tasks. Discussion: Our results suggest that MicrobEx can be used to reliably estimate binary bacterial culture status, extract bacterial species, and map these to SNOMED organism observations when applied to semistructured, free-text microbiology reports from different institutions with relatively low customization. Conclusion: MicrobEx offers an open-source software solution (available on both GitHub and PyPI) for bacterial culture status estimation and bacterial species extraction from free-text microbiology reports. The package was designed to be reused and adapted to individual institutions as an upstream process for other clinical applications such as: machine learning, clinical decision support, and disease surveillance systems.

Original languageEnglish (US)
Article numberooac026
JournalJAMIA Open
Issue number2
StatePublished - Jul 1 2022


  • concept extraction
  • electronic health records
  • information extraction
  • microbiology report
  • natural language processing

ASJC Scopus subject areas

  • Health Informatics


Dive into the research topics of 'Development and validation of MicrobEx: an open-source package for microbiology culture concept extraction'. Together they form a unique fingerprint.

Cite this