Paper Review - Network Mendelian Randomization

In which I record my thoughts on Network Mendelian Randomization by Burgess et al.

What is this paper about? #

This paper describes a method for doing Mendelian Randomization (MR) in the presence of a potential mediating variable. In the typical MR setting, we have an instrumental variable which “instruments” an exposure that we believe causally influences the outcome we care about. True mediators “mediate” the causal influence of exposures on outcomes. In other words, if we’re trying to understand the causal influence of some factor on another factor, it’s possible there’s a factor (mediator) that the exposure influences that then influences the outcome.

This paper presents two methods–one regression-based, one SEM-based–for estimating the “direct” and “indirect” effects of an exposure on the outcome assuming we have instrumental variables for both the exposure and the mediator. Like basically all of the MR papers I’ve read, this paper requires assuming linear effects between variables and homogenous effects across individuals. While I understand why the linear effects assumption makes computing the values we care about much easier, I’m still a bit confused about why everyone else seems to feel so comfortable with it from a philosophical perspective. That said, it’s ubiquitous, so I can’t fault the authors for keeping it.

One limitation of this paper is that, with the exception of the IV, the authors assume all variables are continuous, so this method applies more to measuring the effect of levels of different measurements than presence/absence (binary/categorical variables). They claim extension to binary variables isn’t hard, but I’m always wary of “it’s trivial” claims. In fairness, I also haven’t looked to see if they have a follow-up paper discussing how to extend this to binary variables.

Technical Methods #

The regression-based method the authors describe for estimating direct/indirect effects looks very similar to the standard MR method. In standard MR, we typically regress both the exposure and the outcome on the IV and then take the latter over the former to estimate the causal effect. Their version of this method just extends this to the mediator setting. Concretely, to estimate the (natural) direct effect of $ X $ (exposure) on $ Y $ (outcome) with mediator $ Z $, they derive the following formula,

$$ \hat{\beta}_{X \Rightarrow Y} = \hat{\beta}_{X \rightarrow Y} - \hat{\beta}_{X \rightarrow Z} \hat{\beta}_{Z \rightarrow Y}. $$

To estimate the individual $ \hat{\beta} $s, they just use the normal formula I mentioned above. Intuitively, this just says that the direct effect of the exposure on the outcome is the standard causal effect of the exposure on the outcome minus the part of the effect contributed by $ X $ affecting $ Z $ affecting $ Y $. They also note that an alternate, very similar, way to estimate these effects is two-stage least squares regression, which is where you would e.g. regress the outcome on the fitted value of the exposure rather than on the IV. In their study, this approach produces similar results to the simpler approach, but they mention that another paper motivates the two-stage least squares starting with a Pearl-ian causal model, so it’s potentially more principled.

Why is this important? #

We can imagine a lot of situations in which effects of exposures we’re interested in are mediated by intermediate factors. To do real science, we want to understand which attributes of an exposure’s effect operates by changing intermediate factors vs. by directly changing the outcome.

Also, this paper does a good job of explaining their method from two different perspectives: the more statistical parametric one and the Pearl-ian non-parametric structural causal model one.

Questions #

If we condition on $ Z $ in a graph like the following, won’t we unblock the path between $ X $ and $ Y $?
What changes would we need to make to deal with continuous exposures but binary outcomes?
What’s the difference between the DAGs they show when discussing the regression-based approach and the SEM approach? Is it the presence of the undirected edge error terms in the DAG? I think it’s just that the Pearl framework is non-parametric.