Abstract
We propose an extension of the method presented in Helenowski and Demirtas (2013) involving imputing mixed continuous and binary data to data involving categorical variables with three or more levels. In a bivariate case, the medians for the continuous variable will be computed by each level of the categorical variable and the categorical variable will be ranked as an ordinal variable with respect to these medians, so that each ordinal level assigned to a categorical level is determined by the rank order of medians of the continuous variable for that category. In a multivariate case, the categorical variables are ordered with respect to the continuous variable for which the range among the medians is the largest. Here, ‘bivariate’ indicates that the data set includes two variables while ‘multivariate’ indicates that the data set includes three or more variables. The pairwise correlation between the continuous and ordinal variable is then computed. Data will then be transformed to normally distributed values, imputed via joint modeling, and back-transformed to the original scale via the Barton and Schruben (1993) technique for the continuous variable and quantiles based on the original probabilities of the categorical variable. The algorithm is re-iterated until the absolute difference of the pairwise correlations from the original and imputed data is less than some constant c chosen to maximize the coverage rate and minimize standardized bias. Results from simulations applied to artificial data and to real data involving 74 colorectal patients indicate that our technique as promising.
Original language | English (US) |
---|---|
Pages (from-to) | 183-193 |
Number of pages | 11 |
Journal | Health Services and Outcomes Research Methodology |
Volume | 14 |
Issue number | 4 |
DOIs | |
State | Published - Nov 18 2014 |
Keywords
- Categorical data
- Multiple imputation
- Ordinal data
- Semi-parametric
ASJC Scopus subject areas
- Health Policy
- Public Health, Environmental and Occupational Health