Unsupervised phenotyping of sepsis using nonnegative matrix factorization of temporal trends from a multivariate panel of physiological measurements

Menghan Ding, Yuan Luo*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

11 Scopus citations


Background: Sepsis is a highly lethal and heterogeneous disease. Utilization of an unsupervised method may identify novel clinical phenotypes that lead to targeted therapies and improved care. Methods: Our objective was to derive clinically relevant sepsis phenotypes from a multivariate panel of physiological data using subgraph-augmented nonnegative matrix factorization. We utilized data from the Medical Information Mart for Intensive Care III database of patients who were admitted to the intensive care unit with sepsis. The extracted data contained patient demographics, physiological records, sequential organ failure assessment scores, and comorbidities. We applied frequent subgraph mining to extract subgraphs from physiological time series and performed nonnegative matrix factorization over the subgraphs to derive patient clusters as phenotypes. Finally, we profiled these phenotypes based on demographics, physiological patterns, disease trajectories, comorbidities and outcomes, and performed functional validation of their clinical implications. Results: We analyzed a cohort of 5782 patients, derived three novel phenotypes of distinct clinical characteristics and demonstrated their prognostic implications on patient outcome. Subgroup 1 included relatively less severe/deadly patients (30-day mortality, 17%) and was the smallest-in-size group (n = 1218, 21%). It was characterized by old age (mean age, 73 years), a male majority (male-to-female ratio, 59-to-41), and complex chronic conditions. Subgroup 2 included the most severe/deadliest patients (30-day mortality, 28%) and was the second-in-size group (n = 2036, 35%). It was characterized by a male majority (male-to-female ratio, 60-to-40), severe organ dysfunction or failure compounded by a wide range of comorbidities, and uniquely high incidences of coagulopathy and liver disease. Subgroup 3 included the least severe/deadly patients (30-day mortality, 10%) and was the largest group (n = 2528, 44%). It was characterized by low age (mean age, 60 years), a balanced gender ratio (male-to-female ratio, 50-to-50), the least complicated conditions, and a uniquely high incidence of neurologic disease. These phenotypes were validated to be prognostic factors of mortality for sepsis patients. Conclusions: Our results suggest that these phenotypes can be used to develop targeted therapies based on phenotypic heterogeneity and algorithms designed for monitoring, validating and intervening clinical decisions for sepsis patients.

Original languageEnglish (US)
Article number95
JournalBMC Medical Informatics and Decision Making
StatePublished - Apr 2021


  • Clustering
  • Frequent subgraph mining
  • Gradient boosting machine
  • Intensive care unit
  • Nonnegative matrix factorization
  • Phenotyping
  • Physiological measurements
  • Sepsis
  • Unsupervised learning

ASJC Scopus subject areas

  • Health Policy
  • Health Informatics


Dive into the research topics of 'Unsupervised phenotyping of sepsis using nonnegative matrix factorization of temporal trends from a multivariate panel of physiological measurements'. Together they form a unique fingerprint.

Cite this