A semi-parametric approach to impute mixed continuous and categorical data

Irene B. Helenowski*, Hakan Demirtas, Michael F. McGee

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

We propose an extension of the method presented in Helenowski and Demirtas (2013) involving imputing mixed continuous and binary data to data involving categorical variables with three or more levels. In a bivariate case, the medians for the continuous variable will be computed by each level of the categorical variable and the categorical variable will be ranked as an ordinal variable with respect to these medians, so that each ordinal level assigned to a categorical level is determined by the rank order of medians of the continuous variable for that category. In a multivariate case, the categorical variables are ordered with respect to the continuous variable for which the range among the medians is the largest. Here, ‘bivariate’ indicates that the data set includes two variables while ‘multivariate’ indicates that the data set includes three or more variables. The pairwise correlation between the continuous and ordinal variable is then computed. Data will then be transformed to normally distributed values, imputed via joint modeling, and back-transformed to the original scale via the Barton and Schruben (1993) technique for the continuous variable and quantiles based on the original probabilities of the categorical variable. The algorithm is re-iterated until the absolute difference of the pairwise correlations from the original and imputed data is less than some constant c chosen to maximize the coverage rate and minimize standardized bias. Results from simulations applied to artificial data and to real data involving 74 colorectal patients indicate that our technique as promising.

Original languageEnglish (US)
Pages (from-to)183-193
Number of pages11
JournalHealth Services and Outcomes Research Methodology
Volume14
Issue number4
DOIs
StatePublished - Nov 18 2014

Keywords

  • Categorical data
  • Multiple imputation
  • Ordinal data
  • Semi-parametric

ASJC Scopus subject areas

  • Health Policy
  • Public Health, Environmental and Occupational Health

Fingerprint

Dive into the research topics of 'A semi-parametric approach to impute mixed continuous and categorical data'. Together they form a unique fingerprint.

Cite this