Identification in causal models with hidden variables

  • Ilya Shpitser Johns Hopkins University


Targets of inference that establish causality are phrased in terms of counterfactual responses to interventions. These potential outcomes operationalize cause effect relationships by means of comparisons of cases and controls in hypothetical randomized controlled experiments. In many applied settings, data on such experiments is not directly available, necessitating assumptions linking the counterfactual target of inference with the factual observed data distribution. This link is provided by causal models. Originally defined on potential outcomes directly (Rubin, 1976), causal models have been extended to longitudinal settings (Robins, 1986), and reformulated as graphical models (Spirtes et al., 2001; Pearl, 2009). In settings where common causes of all observed variables are themselves observed, many causal inference targets are identified via variations of the expression referred to in the literature as the g-formula (Robins, 1986), the manipulated distribution (Spirtes et al., 2001), or the truncated factorization (Pearl, 2009). In settings where hidden variables are present, identification results become considerably more complicated. In this manuscript, we review identification theory in causal models with hidden variables for common targets that arise in causal inference applications, including causal effects, direct, indirect, and path-specific effects, and outcomes of dynamic treatment regimes. We will describe a simple formulation of this theory (Tian and Pearl, 2002; Shpitser and Pearl, 2006b,a; Tian, 2008; Shpitser, 2013) in terms of causal graphical models, and the fixing operator, a statistical analogue of the intervention operation (Richardson et al., 2017).