Abstract: In observational studies, selection bias will be completely removed only if the selection mechanism is ignorable, namely, all confounders of treatment selection and potential outcomes are reliably measured. Ideally, well-grounded substantive theories about the selection process and outcome-generating model are used to generate the sample of covariates. However, covariate selection is more heuristic in actual practice. Using two empirical data sets in a simulation study, we investigate four research questions about bias reduction when the selection mechanism is not known but many covariates are measured: (1) How important is the conceptual heterogeneity of the covariate domains in the data set? (2) How important is the number of covariates assessing each domain? (3) What are the joint effects of this conceptual heterogeneity and of the number of covariates per domain? (4) What happens to bias reduction when the set of covariates is deliberately impoverished by removing the covariates most responsible for selection bias, thus ensuring a slightly smaller but still heterogeneous set of covariates? The results indicate: (1) increasingly more bias is reduced as the number of covariate domains and the number of covariates per domain increase, though the rate of bias reduction is diminishing in each case; (2) sampling covariates from multiple heterogeneous covariate domains is more important than choosing many measures from fewer domains; (3) the most heterogeneous set of covariate domains removes almost all of the selection bias when at least five covariates are assessed in each domain; and (4) omitting the most crucial covariates generally replicates the pattern of results due to the number of domains and the number of covariates per domain, but the amount of bias reduction is less than when all variables are included and will surely not satisfy all consumers of causal research.
- Observational study, causal inference, propensity score, covariate selection
ASJC Scopus subject areas