Bayesian Generative Methods for Extracting and Modeling Relations in EHR Narratives

Project: Research project

Project Details


Bayesian Generative Methods for Extracting and Modeling Relations in EHR Narratives
Medicine has evolved into an era where the entire hospital progressively adopts intense real-time monitoring for the patients and generates ICU like clinical data. This rapidly growing data makes ICU a snapshot for tomorrow’s standard of care that should benefit from computer-aided decision making. These data contain both numerical or coded information, and a majority of unstructured narrative text data such as physicians’ and nurses' notes, specialists' reports, and discharge summaries. Both types of data have been shown to be highly informative for tasks such as cohort selection, and work best in combination. However, to achieve this, specific bits of information must be extracted from the narrative reports and coded in some formal representation. These bits include medical concepts such as symptoms, diseases, medications procedures; characteristics such as certainty, severity, dose; assertions about these items, such as whether they pertain to the patient or a family member, etc.; relations among these mentions, including indications of what condition is treated by what action and its degree of success, the time sequence and duration of events, and interpretations of laboratory test results as relations among medical concepts such as cells and antigens (e.g., “[large atypical cells] express [CD30]”). Concepts and assertions can be regarded as simple relations, and our proposal focuses on modeling narrative relations as complementary information to numeric data for predicting patient outcomes.

Most existing techniques for interpreting clinical narratives rely on either hand-crafted rule systems and large medical thesauri or are based on machine learning models that create classification or regression models from large annotated data sets. The former are difficult and laborious to generalize, whereas the latter require large volumes of human-labeled data and may result in models whose operation is difficult to interpret and is therefore considered untrustworthy. We propose to build on our previous work to use unsupervised learning methods that identify frequent patterns in un-annotated narratives and identify informative patterns by tensor factorization. These methods can also identify patterns that, though meaningful in a data-driven sense, are difficult for clinicians to understand.

Our specific goal is to develop a novel method that uses a Bayesian generative model that integrates relation mining with tensor factorization to learn patterns that correspond to an understanding of the clinical domain and also reliably predict the patient outcomes (e.g., mortality and re-admission).
Aim 1: Develop a Bayesian generative framework that automatically extracts and groups relations for modeling EHR narratives.
Aim 2: Validate the impact of the new models in augmenting structured features for predicting patients’ mortality and re-admission risks on retrospectively collected ICU patient data both from public source and from Northwestern Medicine Enterprise Data Warehouse.
Effective start/end date9/1/178/31/20


  • National Library of Medicine (1R21LM012618-01)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.