# Deriving the front-door criterion with the do-calculus

Attention conservation notice: Narrow target audience - will only make sense to people somewhat familiar with causal inference who don’t find the result entirely boring.

## The Front-Door Criterion #

Suppose we have a causal graphical model that looks like the following. Assume $U$ is unmeasured whereas $X, M, Y$ can be measured. Notice that:

1. All directed paths from $X$ to $Y$ flow through $M$.
2. $X$ blocks all back-door paths from $M$ to $Y$.
3. There are no unblocked back-door paths from $X$ to $M$.

One of the most striking results in early causal inference literature, called the front-door criterion, states that, for all graphs like ours which satisfy these three criteria, the causal effect $P(y \mid \mathrm{do}(x))$ is identifiable by the formula (assume discrete variables for convenience)

$$P(y \mid do(x)) = \sum_{m \in M} P(m \mid x) \sum_{x’ \in X} P(y \mid m, x’) P(x’).$$

Intuitively, this is ‘striking’ because it shows that we can identify the effect of a cause on an effect if we know the distribution of a mediator, even if there’s unmeasured confounding between the cause and effect.

As an exercise, I thought it would be fun to use the do-calculus to re-derive the front-door criterion (for a representative front-door graph).

## Do-Calculus Brief Refresher #

In this section, I quickly review the rules of the do-calculus and the intuition for them. For a more in depth but understandable presentation of the do-calculus, I recommend Michael Nielsen’s article1 on the topic. For a “classic”, more technical presentation of the do-calculus, see Judea Pearl’s “The Do-Calculus Revisited2”.

The do-calculus provides a complete3 algebra for transforming causal quantities into observational ones. I.e., transforming probabilities that include $\mathrm{do}(\cdot)$ terms into probabilities that only include normal conditional terms. Completeness means that a query ($\mathrm{do}$ term) is identifiable if-and-only-if we can use do-calculus to compute its observational equivalent.

### Do-Calculus Rules #

The actual calculus consists of three rules for transforming $\mathrm{do}$ queries into observational ones. Since the rules are complete, they include a little extra baggage, but each one comes from an understandable intuition that I’ll describe below.

#### Notation #

For each rule, we assume that we have a causal model that includes at least 4 node sets $X, Z, Y, W$ in a graph $G$. All three rules treat $Y$ as the outcome variable and $Z$ as the term(s) we want to remove or transform. $G_{\overline{X}}$ denotes a modified version of the original graph $G$ in which all arrows going into nodes in $X$ have been removed. $G_{\underline{X}}$ denotes a modified version of the original graph $G$ in which all arrows coming out of nodes in $X$ have been removed. In practice, when we apply these rules, we’ll mostly be using single-node sets for $X, Z, Y$ but we still list the set versions to align with other presentations.

#### Rule 1: When we can ignore an observation #

Rule 1 says that we can ignore an observation of a quantity when it doesn’t influence the outcome through any path. We formalize this as $$P(y \mid z, \mathrm{do}(x), w) = P(y \mid \mathrm{do}(x), w) \text{ if } (Y \perp Z \mid W, X)_{G_{\overline{X}}}.$$

#### Rule 2: When we can treat an intervention as an observation #

Rule 2 says that observations and interventions are equivalent when the causal effect of a variable on the outcome only influences the outcome through directed paths. We formalize this as

$$P(y \mid \mathrm{do}(z), \mathrm{do}(x), w) = P(y \mid z, \mathrm{do}(x), w) \text{ if } (Y \perp Z \mid W, X)_{G_{\overline{X}, \underline{Z}}}.$$

Note: Rule 2 can also be thought of as a generalization of the back-door criterion in which $\mathrm{do}(X), W$ together form a back-door admissible set.

#### Rule 3: When we can ignore an intervention #

Rule 3 says that we can ignore an intervention when it doesn’t influence the outcome through any path, formalized as

$$P(y \mid \mathrm{do}(z), \mathrm{do}(x), w) = P(y \mid \mathrm{do}(x), w) \text{ if } (Y \perp Z \mid W, X)_{G_{\overline{X}, \overline{Z(W)}}}.$$

$G_{\overline{Z(W)}}$ means “in the graph in which we block all incoming arrows to nodes in $Z$ that aren’t ancestors of nodes in $W$. For example, if I have a graph $G = U \rightarrow Z \rightarrow W \rightarrow Y$, then $G_{\overline{Z(W)}} = G$ because $Z$ is an ancestor of $W$.

## Derivation with Do-Calculus #

Assuming we have the graph given above and start with $P(y \mid \mathrm{do}(x))$, we can use the following series of do-calculus and probability transformations to derive an expression that only includes observational probabilities:

\begin{aligned} & P(y \mid \mathrm{do}(x)) \\\\ &= \sum_z P(y \mid \mathrm{do}(x), z) P(z \mid \mathrm{do}(x)) & \text{(Rule 2 twice)} \\ &= \sum_z P(y \mid \mathrm{do}(x), \mathrm{do}(z)) P(z \mid x) & \text{(Rule 2)} \\ &= \sum_{z, u} P(y \mid \mathrm{do}(x), \mathrm{do}(z), u) P(u \mid \mathrm{do}(x), \mathrm{do}(z)) P(z \mid x) & \text{(Rule 3)} \\ &= \sum_{z, u} P(y \mid \mathrm{do}(z), u) P(u \mid \mathrm{do}(z)) P(z \mid x) & \text{(Marginalization)} \\ &= \sum_{z} P(y \mid \mathrm{do}(z)) P(z \mid x) & \text{(Introduce auxiliary } x_1 \text{ )} \\ &= \sum_{z} P(z \mid x) \sum_{x_1} P(y \mid \mathrm{do}(z), x_1) P(x_1 \mid \mathrm{do}(z)) & \text{(Rule 2)} \\ &= \sum_{z} P(z \mid x) \sum_{x_1} P(y \mid z, x_1) P(x_1 \mid \mathrm{do}(z)) & \text{(Rule 3)} \\ &= \sum_{z} P(z \mid x) \sum_{x_1} P(y \mid z, x_1) P(x_1). \end{aligned}

1. Ilya Shpiter’s work proving the completeness of the do-calculus: note that I haven’t read this and currently don’t know the completeness proof. ↩︎