Modeling major lung resection outcomes using classification trees and multiple imputation techniques

Mark K. Ferguson*, Juned Siddique, Theodore Karrison

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

27 Scopus citations


Objective: Modeling of operative risks associated with major lung resection is potentially inaccurate and inefficient because of incomplete observations for predictor variables (covariates). Missing values do not usually occur randomly, potentially introducing an important source of bias in modeling. Deletion of cases with missing data also results in loss of precision. The current study analyzes incomplete variables as potential predictors of outcomes after major lung resection using imputation techniques. Methods: We analyzed major lung resection patients treated from 1980 to 2006 for predictors of pulmonary, cardiovascular, and overall complications, as well as mortality. Predictive variables were initially determined using classification and regression tree (CART) methods. Imputation models were developed and variables with missing values were multiply imputed. We fit a logistic regression model for each outcome using CART variables and any covariates that were of interest clinically. Results: Of 1046 resected patients, serum albumin and diffusing capacity (DLCO%) had a large number of missing values (32% and 13% missing, respectively). Models included 10 covariates for pulmonary complications (p < 0.05 for DLCO% and forced expiratory volume in the first second [FEV1%]), 12 covariates for cardiovascular complications (p < 0.05 for FEV1%, extent of resection, year of operation, and age), 15 covariates for overall complications (p < 0.05 for DLCO%, performance status, serum albumin, and FEV1/FVC ratio), and 12 covariates for death (p < 0.05 for DLCO%, extent of resection, and operation year). Conclusions: We identified serum albumin as a previously under-reported and strong predictor of overall complications. Serum albumin was marginally significantly related to pulmonary and cardiovascular outcomes after major lung surgery. Use of imputation techniques for modeling surgical risks has potential value in identifying important predictive variables that may ordinarily be eliminated from analysis or not identified as predictors because of incomplete observations in clinical databases.

Original languageEnglish (US)
Pages (from-to)1085-1089
Number of pages5
JournalEuropean Journal of Cardio-thoracic Surgery
Issue number5
StatePublished - Nov 2008


  • Classification and regression tree
  • Diffusing capacity
  • Imputation
  • Lung
  • Neoplasm
  • Serum albumin
  • Surgical risk

ASJC Scopus subject areas

  • Surgery
  • Pulmonary and Respiratory Medicine
  • Cardiology and Cardiovascular Medicine


Dive into the research topics of 'Modeling major lung resection outcomes using classification trees and multiple imputation techniques'. Together they form a unique fingerprint.

Cite this