Deriving the front-door criterion with the do-calculus

Attention conservation notice: Narrow target audience - will only make sense to people somewhat familiar with causal inference who don’t find the result entirely boring.

The Front-Door Criterion #

Suppose we have a causal graphical model that looks like the following.

Assume $ U $ is unmeasured whereas $ X, M, Y $ can be measured. Notice that:

  1. All directed paths from $ X $ to $ Y $ flow through $ M $.
  2. $ X $ blocks all back-door paths from $ M $ to $ Y $.
  3. There are no unblocked back-door paths from $ X $ to $ M $.

One of the most striking results in early causal inference literature, called the front-door criterion, states that, for all graphs like ours which satisfy these three criteria, the causal effect $ P(y \mid \mathrm{do}(x)) $ is identifiable by the formula (assume discrete variables for convenience)

$$ P(y \mid do(x)) = \sum_{m \in M} P(m \mid x) \sum_{x’ \in X} P(y \mid m, x’) P(x’). $$

Intuitively, this is ‘striking’ because it shows that we can identify the effect of a cause on an effect if we know the distribution of a mediator, even if there’s unmeasured confounding between the cause and effect.

As an exercise, I thought it would be fun to use the do-calculus to re-derive the front-door criterion (for a representative front-door graph). (1)If correlation doesn’t imply causation, then what does?

Do-Calculus Brief Refresher #

In this section, I quickly review the rules of the do-calculus and the intuition for them. For a more in depth but understandable presentation of the do-calculus, I recommend Michael Nielsen’s article1 on the topic. For a “classic”, more technical presentation of the do-calculus, see Judea Pearl’s “The Do-Calculus Revisited2”. (2)Primer by Pearl on the do-calculus

The do-calculus provides a complete3 algebra for transforming causal quantities into observational ones. I.e., transforming probabilities that include $ \mathrm{do}(\cdot) $ terms into probabilities that only include normal conditional terms. Completeness means that a query ($ \mathrm{do} $ term) is identifiable if-and-only-if we can use do-calculus to compute its observational equivalent.

Do-Calculus Rules #

The actual calculus consists of three rules for transforming $ \mathrm{do} $ queries into observational ones. Since the rules are complete, they include a little extra baggage, but each one comes from an understandable intuition that I’ll describe below.

Notation #

For each rule, we assume that we have a causal model that includes at least 4 node sets $ X, Z, Y, W $ in a graph $ G $. All three rules treat $ Y $ as the outcome variable and $ Z $ as the term(s) we want to remove or transform. $ G_{\overline{X}} $ denotes a modified version of the original graph $ G $ in which all arrows going into nodes in $ X $ have been removed. $ G_{\underline{X}} $ denotes a modified version of the original graph $ G $ in which all arrows coming out of nodes in $ X $ have been removed. In practice, when we apply these rules, we’ll mostly be using single-node sets for $ X, Z, Y $ but we still list the set versions to align with other presentations.

Rule 1: When we can ignore an observation #

Rule 1 says that we can ignore an observation of a quantity when it doesn’t influence the outcome through any path. We formalize this as $$ P(y \mid z, \mathrm{do}(x), w) = P(y \mid \mathrm{do}(x), w) \text{ if } (Y \perp Z \mid W, X)_{G_{\overline{X}}}. $$

Rule 2: When we can treat an intervention as an observation #

Rule 2 says that observations and interventions are equivalent when the causal effect of a variable on the outcome only influences the outcome through directed paths. We formalize this as

$$ P(y \mid \mathrm{do}(z), \mathrm{do}(x), w) = P(y \mid z, \mathrm{do}(x), w) \text{ if } (Y \perp Z \mid W, X)_{G_{\overline{X}, \underline{Z}}}. $$

Note: Rule 2 can also be thought of as a generalization of the back-door criterion in which $ \mathrm{do}(X), W $ together form a back-door admissible set.

Rule 3: When we can ignore an intervention #

Rule 3 says that we can ignore an intervention when it doesn’t influence the outcome through any path, formalized as

$$ P(y \mid \mathrm{do}(z), \mathrm{do}(x), w) = P(y \mid \mathrm{do}(x), w) \text{ if } (Y \perp Z \mid W, X)_{G_{\overline{X}, \overline{Z(W)}}}. $$

$ G_{\overline{Z(W)}} $ means “in the graph in which we block all incoming arrows to nodes in $ Z $ that aren’t ancestors of nodes in $ W $. For example, if I have a graph $ G = U \rightarrow Z \rightarrow W \rightarrow Y $, then $ G_{\overline{Z(W)}} = G $ because $ Z $ is an ancestor of $ W $.

Derivation with Do-Calculus #

Assuming we have the graph given above and start with $ P(y \mid \mathrm{do}(x)) $, we can use the following series of do-calculus and probability transformations to derive an expression that only includes observational probabilities:

$$ \begin{aligned} & P(y \mid \mathrm{do}(x)) \\\\ &= \sum_z P(y \mid \mathrm{do}(x), z) P(z \mid \mathrm{do}(x)) & \text{(Rule 2 twice)} \\ &= \sum_z P(y \mid \mathrm{do}(x), \mathrm{do}(z)) P(z \mid x) & \text{(Rule 2)} \\ &= \sum_{z, u} P(y \mid \mathrm{do}(x), \mathrm{do}(z), u) P(u \mid \mathrm{do}(x), \mathrm{do}(z)) P(z \mid x) & \text{(Rule 3)} \\ &= \sum_{z, u} P(y \mid \mathrm{do}(z), u) P(u \mid \mathrm{do}(z)) P(z \mid x) & \text{(Marginalization)} \\ &= \sum_{z} P(y \mid \mathrm{do}(z)) P(z \mid x) & \text{(Introduce auxiliary } x_1 \text{ )} \\ &= \sum_{z} P(z \mid x) \sum_{x_1} P(y \mid \mathrm{do}(z), x_1) P(x_1 \mid \mathrm{do}(z)) & \text{(Rule 2)} \\ &= \sum_{z} P(z \mid x) \sum_{x_1} P(y \mid z, x_1) P(x_1 \mid \mathrm{do}(z)) & \text{(Rule 3)} \\ &= \sum_{z} P(z \mid x) \sum_{x_1} P(y \mid z, x_1) P(x_1). \end{aligned} $$