Researchers are often interested in estimating the causal effects of sustained treatment strategies, i.e., of (hypothetical) interventions involving time-varying treatments. When using observational data, estimating those effects requires adjustment for confounding. However, conventional regression methods cannot appropriately adjust for confounding in the presence of treatment-confounder feedback. In contrast, estimators derived from Robins's g-formula may correctly adjust for confounding even if treatment-confounder feedback exists. The package gfoRmula implements in R one such estimator: the parametric g-formula. This estimator can be used to estimate the effects of binary or continuous time-varying treatments as well as contrasts defined by static or dynamic, deterministic, or random interventions, as well as interventions that depend on the natural value of treatment. The package accommodates survival outcomes as well as binary or continuous outcomes measured at the end of follow-up. This paper describes the gfoRmula package, along with motivating background, features, and examples. Causal inference is a core task of data science. When data from randomized experiments are not available, data analysts often rely on nonexperimental (observational) data to estimate causal effects. The parametric g-formula is a statistical method to estimate the causal effects of sustained treatment strategies from observational data with time-varying treatments, confounders, and outcomes. Although this methodology was introduced in the 1980s, it has not been widely used due to the lack of open-source software. This article presents the gfoRmula package, an implementation of the parametric g-formula in R. The aim of this software is to facilitate the application of the parametric g-formula to complex, observational data to answer causal questions. Furthermore, this package helps provide a way to compare the performance of the parametric g-formula to other methods in the causal inference literature. McGrath et al. present the statistical software package, gfoRmula. This package implements the parametric g-formula, a statistical method to estimate the causal effects of sustained treatment strategies from observational data with time-varying treatments, confounders, and outcomes.
- DSML 2: Proof-of-Concept: Data science output has been formulated, implemented, and tested for one domain/problem
- causal inference
- longitudinal data
ASJC Scopus subject areas
- General Decision Sciences