Pulmonary emphysema subtypes defined by unsupervised machine learning on CT scans

Elsa D. Angelini, Jie Yang, Pallavi P. Balte, Eric A. Hoffman, Ani W. Manichaikul, Yifei Sun, Wei Shen, John H.M. Austin, Norrina B. Allen, Eugene R. Bleecker, Russell Bowler, Michael H. Cho, Christopher S. Cooper, David Couper, Mark T. Dransfield, Christine Kim Garcia, Meilan K. Han, Nadia N. Hansel, Emlyn Hughes, David R. JacobsSilva Kasela, Joel Daniel Kaufman, John Shinn Kim, Tuuli Lappalainen, Joao Lima, Daniel Malinsky, Fernando J. Martinez, Elizabeth C. Oelsner, Victor E. Ortega, Robert Paine, Wendy Post, Tess D. Pottinger, Martin R. Prince, Stephen S. Rich, Edwin K. Silverman, Benjamin M. Smith, Andrew J. Swift, Karol E. Watson, Prescott G. Woodruff, Andrew F. Laine, R. Graham Barr*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Scopus citations


Background Treatment and preventative advances for chronic obstructive pulmonary disease (COPD) have been slow due, in part, to limited subphenotypes. We tested if unsupervised machine learning on CT images would discover CT emphysema subtypes with distinct characteristics, prognoses and genetic associations. Methods New CT emphysema subtypes were identified by unsupervised machine learning on only the texture and location of emphysematous regions on CT scans from 2853 participants in the Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS), a COPD case-control study, followed by data reduction. Subtypes were compared with symptoms and physiology among 2949 participants in the population-based Multi-Ethnic Study of Atherosclerosis (MESA) Lung Study and with prognosis among 6658 MESA participants. Associations with genome-wide single-nucleotide-polymorphisms were examined. Results The algorithm discovered six reproducible (interlearner intraclass correlation coefficient, 0.91-1.00) CT emphysema subtypes. The most common subtype in SPIROMICS, the combined bronchitis-apical subtype, was associated with chronic bronchitis, accelerated lung function decline, hospitalisations, deaths, incident airflow limitation and a gene variant near DRD1, which is implicated in mucin hypersecretion (p=1.1 ×10 -8). The second, the diffuse subtype was associated with lower weight, respiratory hospitalisations and deaths, and incident airflow limitation. The third was associated with age only. The fourth and fifth visually resembled combined pulmonary fibrosis emphysema and had distinct symptoms, physiology, prognosis and genetic associations. The sixth visually resembled vanishing lung syndrome. Conclusion Large-scale unsupervised machine learning on CT scans defined six reproducible, familiar CT emphysema subtypes that suggest paths to specific diagnosis and personalised therapies in COPD and pre-COPD.

Original languageEnglish (US)
Pages (from-to)1067-1079
Number of pages13
Issue number11
StatePublished - Nov 1 2023


  • COPD epidemiology
  • Emphysema
  • Imaging/CT MRI etc

ASJC Scopus subject areas

  • Pulmonary and Respiratory Medicine


Dive into the research topics of 'Pulmonary emphysema subtypes defined by unsupervised machine learning on CT scans'. Together they form a unique fingerprint.

Cite this