Multivariable Mendelian randomization • MendelianRandomization

An analytic approach when genetic variants are associated with multiple risk factors is multivariable Mendelian randomization, introduced in Burgess and Thompson (2015) “Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects”, doi: 10.1093/aje/kwu283.

There are two contexts in which we envision the method being used. First, when there is a set of related risk factors, such as lipid fractions. It is difficult to find genetic predictors of HDL-cholesterol that do not also predict LDL-cholesterol and/or triglycerides. Multivariable Mendelian randomization allows genetic variants to be associated with all the risk factors in the model provided that they do not influence the outcome via other pathways. Secondly, when there is a network of risk factors, typically a primary risk factor and a secondary risk factor or mediator. In both cases, multivariable estimates reflect direct causal effects - the result of intervening on the risk factor under analysis while keeping other risk factors in the model fixed.

Input format

As the multivariable method requires genetic associations with multiple risk factors, a different input function is needed: mr_mvinput, which creates an MRMVInput object:

MVMRInputObject <- mr_mvinput(bx = cbind(ldlc, hdlc, trig),
                              bxse = cbind(ldlcse, hdlcse, trigse),
                              by = chdlodds, 
                              byse = chdloddsse)

MVMRInputObject

##     SNP exposure_1.beta exposure_2.beta exposure_3.beta exposure_1.se
## 1 snp_1           0.026           0.002           0.016         0.004
## 2 snp_2          -0.044           0.005          -0.004         0.004
## 3 snp_3          -0.038           0.003          -0.009         0.004
##   exposure_2.se exposure_3.se outcome.beta outcome.se
## 1         0.004         0.008       0.0677     0.0286
## 2         0.004         0.008      -0.1625     0.0300
## 3         0.004         0.008      -0.1054     0.0310
##  [ reached 'max' / getOption("max.print") -- omitted 25 rows ]

MVMRInputObject.cor <- mr_mvinput(bx = cbind(ldlc, hdlc, trig),
                                  bxse = cbind(ldlcse, hdlcse, trigse),
                                  by = chdlodds, 
                                  byse = chdloddsse,
                                  correlation = diag(length(ldlc)))

Multivariable inverse-variance weighted method

The mr_mvivw command performs an extension of the inverse-variance weighted method for multivariable Mendelian randomization, implementing multivariable weighted linear regression (the standard IVW method method performs univariable weighted linear regression) with the intercept term set to zero. As with the standard IVW method, a correlation matrix can be specified. Multivariable Mendelian randomization requires the number of genetic variants to exceed the number of risk factors; if this is not the case, the method will return an error.

MVIVWObject <- mr_mvivw(MVMRInputObject, 
                         model = "default",
                         correl = FALSE,
                         correl.x = NULL,
                         nx = NA,
                         distribution = "normal",
                         alpha = 0.05)

MVIVWObject <- mr_mvivw(MVMRInputObject)

MVIVWObject

## 
## Multivariable inverse-variance weighted method
## (variants uncorrelated, random-effect model)
## 
## Number of Variants : 28 
## 
## ------------------------------------------------------------------
##    Exposure Estimate Std Error  95% CI       p-value
##  exposure_1    1.925     0.439  1.064, 2.786   0.000
##  exposure_2   -0.590     0.555 -1.677, 0.498   0.288
##  exposure_3    0.723     0.230  0.272, 1.174   0.002
## ------------------------------------------------------------------
## Residual standard error =  1.433 
## Heterogeneity test statistic = 51.3599 on 25 degrees of freedom, (p-value = 0.0014)

Conditional F statistics: The nx option specifies the sample size(s) for the genetic associations with the exposures (either a single value if these are all equal, or a vector if they differ). The correl.x option specifies the correlation between the genetic associations with the exposures. If the genetic assocations with the exposures are estimates in separate datasets, then this correlation matrix should be set to the identity matrix (this is the default option). This matrix cannot be calculated from summarized data; if it is unknown, then a sensitivity analysis for its value is recommended. The nx and correl.x options are not used in the calculation of estimates from the multivariable inverse-variance weighted method method, but only in the calculation of conditional F statistics. Conditional F statistics are the relevant measure of instrument strength for multivariable Mendelian randomization. For more detail, see Sanderson, Spiller, Bowden (2021) “Testing and correcting for weak and pleiotropic instruments in two-sample multivariable Mendelian randomization”, doi: 10.1002/sim.9133.

MVIVWObject.condF <- mr_mvivw(MVMRInputObject, nx = 17723)

MVIVWObject.condF

## 
## Multivariable inverse-variance weighted method
## (variants uncorrelated, random-effect model)
## 
## Number of Variants : 28 
## 
## ------------------------------------------------------------------
##    Exposure Estimate Std Error  95% CI       p-value Cond F-stat
##  exposure_1    1.925     0.439  1.064, 2.786   0.000        20.3
##  exposure_2   -0.590     0.555 -1.677, 0.498   0.288        12.9
##  exposure_3    0.723     0.230  0.272, 1.174   0.002        13.3
## ------------------------------------------------------------------
## Residual standard error =  1.433 
## Heterogeneity test statistic = 51.3599 on 25 degrees of freedom, (p-value = 0.0014)

In addition to the multivariable inverse-variance weighted method, several other multivariable estimation methods are available, which generally are multivariable extensions of univariable methods: multivariable MR-Egger, multivariable median-based method, multivariable lasso method, multivariable constrained maximum likelihood, multivariable GMM, and multivariable principal components GMM.

Multivariable MR-Egger

Multivariable MR-Egger is an extension of the MR-Egger method to deal with genetic variants that are associated with multiple risk factors, introduced in Rees et al (2017) “Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy” DOI: 10.1002/sim.7492

We implement the method using multivariable weighted linear regression. If the variants are correlated, the method is implemented using generalized weighted linear regression; this is hard coded using matrix algebra.

The causal estimate is obtained by regression of the associations with the outcome on the associations with the risk factors, with the intercept estimated and weights being the inverse-variances of the associations with the outcome.

MVEggerObject <- mr_mvegger(MVMRInputObject)

MVEggerObject

## 
## Multivariable MR-Egger method
## (variants uncorrelated, random-effect model)
## 
## Orientated to exposure : 1 
## Number of Variants : 28 
## ------------------------------------------------------------------
##     Exposure Estimate Std Error  95% CI        p-value
##   exposure_1    2.829     0.515  1.819,  3.839   0.000
##   exposure_2   -0.663     0.496 -1.636,  0.310   0.182
##   exposure_3    0.827     0.209  0.417,  1.237   0.000
##  (intercept)   -0.028     0.010 -0.049, -0.008   0.007
## ------------------------------------------------------------------
## Residual standard error =  1.280 
## Heterogeneity test statistic = 39.3421 on 24 degrees of freedom, (p-value = 0.0251)

Multivariable median-based method

Multivariable median method is similar to the univariable weighted median method, except that it is implemented using quantile regression. This is implemented by multivariable weighted quantile regression, with the quantile set to 0.5 (the median).

MVMedianObject <- mr_mvmedian(MVMRInputObject)

MVMedianObject

## 
## Multivariable median method 
## 
## Number of variants : 28 
## ------------------------------------------------------------------
##    Exposure Estimate Std Error  95% CI       p-value
##  exposure_1    2.116     0.482  1.172, 3.061   0.000
##  exposure_2    0.111     0.656 -1.176, 1.397   0.866
##  exposure_3    1.039     0.261  0.528, 1.550   0.000
## ------------------------------------------------------------------

Multivariable lasso method

The mr_mvlasso function performs the multivariable MR-Lasso method as described in Grant and Burgess (2020) “Pleiotropy robust methods for multivariable Mendelian randomization” DOI: 10.1002/sim.9156. This applies lasso-type penalization to the direct effects of genetic variants on the outcome. The causal estimates are described as post-lasso estimates, and are obtained by performing the multivariable IVW method using only those genetic variants that are identified as valid by the lasso procedure.

Multivariable MR-Lasso extends the multivariable IVW model to include an intercept term for each genetic variant. These intercept terms represent associations between the genetic variants and the outcome which bypass the risk factors. The regularized regression model is estimated by multivariable weighted linear regression where the intercept terms are subject to lasso-type penalization. The lasso penalization will tend to shrink the intercept terms corresponding to the valid instruments to zero.

The lasso penalty relies on a tuning parameter which controls the level of sparsity. The default is to use a heterogeneity stopping rule, but a fixed value may be specified.

As part of the analysis, the genetic variants are orientated so that all of the associations with one of the risk factors are positive (the first risk factor is used by default). Re-orientation of the genetic variants is performed automatically as part of the function.

The MR-Lasso method is performed in two steps. First, a regularized regression model is fitted, and some genetic variants are identified as valid instruments. Second, causal effects are estimated using standard multivariable IVW with only the valid genetic variants. The post-lasso method will be performed as long as the number of genetic variants identified as valid instruments is greater than the number of risk factors. The default heterogeneity stopping rule will always return more genetic variants as valid instruments than risk factors for identification. The main estimates given by the method are the post-lasso estimates. However, parameter estimates from the regularized regression model used to identify invalid variants are also provided for completeness.

If a substantial proportion of genetic variants are removed from the analysis, the multivariable MR-Lasso method may give a false impression of confidence in the causal estimate due to homogeneity of the variant-specific causal estimates amongst the remaining variants. However, it is not reasonable to claim that there is strong evidence for a causal effect after a large number of variants with heterogeneous estimates have been removed from the analysis.

MVLassoObject <- mr_mvlasso(MVMRInputObject)

MVLassoObject

## 
## Multivariable MR-Lasso method 
## 
## Orientated to exposure : 1 
## Number of variants : 28 
## Number of valid instruments : 25 
## Tuning parameter : 0.4641104 
## ------------------------------------------------------------------
##    Exposure Estimate Std Error  95% CI       p-value
##  exposure_1    1.726     0.383  0.976, 2.477   0.000
##  exposure_2   -0.126     0.441 -0.991, 0.738   0.775
##  exposure_3    0.858     0.185  0.496, 1.220   0.000
## ------------------------------------------------------------------

Multivariable constrained maximum likelihood

Multivariable MRcML (MVMRcML) is an extension of MRcML to deal with multiple exposures, as described in Lin et al (2023) “Robust multivariable Mendelian randomization based on constrained maximum likelihood” DOI 10.1016/j.ajhg.2023.02.014 . It is robust to both correlated and uncorrelated pleiotropy as with the univariable equivalent.

In practice, the data perturbation (DP) version is preferred in practice for a more robust inference as it can account for the uncertainty in model selection. However, it may take a longer time especially when the number of IVs is large (so the range of K_vec can be large too). One strategy is to try a small range of K (the number of invalid IVs) first (with a small num_pert), then adjust it if the number of selected invalid IVs fall close to the boundary. You can also use other methods, e.g. mr_mvlasso, to get a rough sense of the number of invalid IVs.

Similar to mr_cML, multiple random starting points could be used to find a global minimum.

MVcMLObject <- mr_mvcML(MVMRInputObject, n = 17723)

MVcMLObject

## 
## Multivariable MRcML method 
## 
## Number of variants : 28 
## Results for:  MVMRcML-DP 
## Number of data perturbations with successful convergence:  200 
## ------------------------------------------------------------------
##    Exposure Estimate Std Error  95% CI       p-value
##  exposure_1    1.927     0.613  0.726, 3.127   0.002
##  exposure_2   -0.350     0.671 -1.666, 0.966   0.602
##  exposure_3    0.822     0.272  0.289, 1.354   0.002
## ------------------------------------------------------------------

Multivariable GMM

Multivariable Mendelian randomization via the generalized method of moments method (GMM) is implemented by the mr_mvgmm command. The key advantage of the GMM method is that estimates are more robust to weak instrument bias and/or measurement error in the exposures compared with the standard IVW method. Weak instrument bias is particularly important in the multivariable setting, as bias in the multivariable setting can be in any direction, particularly if instrument strength varies between the exposures.

Unlike the multivariable IVW method (mr_mvivw), the multivariable GMM method requires the sample sizes for genetic associations with the exposure, and the sample size for the genetic associations with the outcome. The method accounts for overdispersion heterogeneity in genetic variant-outcome associations (using the default option robust = TRUE).

MVGMMObject <- mr_mvgmm(MVMRInputObject, nx=rep(17723,3), ny=17723)

MVGMMObject

## 
## Multivariable generalized method of moments (GMM) method
## 
## Exposure correlation matrix not specified. Exposures are assumed to be uncorrelated.
## 
## Robust model with overdispersion heterogeneity.
## 
## ------------------------------------------------------------------
##    Exposure Estimate Std Error  95% CI       p-value Cond F-stat
##  exposure_1    1.830     0.451  0.947, 2.713   0.000        20.3
##  exposure_2   -0.722     0.580 -1.858, 0.415   0.213        12.9
##  exposure_3    0.686     0.239  0.217, 1.155   0.004        13.3
## ------------------------------------------------------------------
## 
## Overdispersion heterogeneity parameter estimate = 17.369 
## 
## Heterogeneity test statistic = 24.9273

Multivariable principal components GMM

Multivariable principal components GMM is a multivariable version of the PC-GMM method, as described in Patel et al. “Robust use of phenotypic heterogeneity at drug target genes for mechanistic insights: application of cis-multivariable Mendelian randomization to GLP1R gene region” DOI 10.1101/2023.07.20.23292958

This can be used when there are distinct exposures associated with variants at a single gene region. Phenotypic heterogeneity (that is, the genetic associations with the exposures are not collinear) at genomic loci encoding drug targets can be exploited by multivariable Mendelian randomization to provide insight on the pathways by which pharmacological interventions may affect disease risk.

When a Mendelian randomization analysis is performed using correlated genetic variants from a single gene region, there is a tradeoff between using too few variants (and compromising on power) and using too many variants (in which case, estimates can be highly sensitive to small variation in the correlation matrix). This method performs dimension reduction on a weighted version of the genetic correlation matrix to form principal components based on the genetic variants, which are then used as instruments. It is recommended not to include very highly correlated variants in this method (say, r^2 > 0.95), but the method should cope well with variants correlated below this level.

This method provides two-sample multivariable Mendelian randomization estimates and associated confidence intervals that account for overdispersion heterogeneity in dimension-reduced genetic associations (when robust = TRUE).

MVpcGMMObject <- mr_mvpcgmm(MVMRInputObject.cor, nx=rep(17723,3), ny=17723)

MVpcGMMObject

## 
## Multivariable principal components generalized method of moments (PC-GMM) method
## 
## Exposure correlation matrix not specified. Exposures are assumed to be uncorrelated.
## 
## Number of principal components used : 20 
## 
## Robust model with overdispersion heterogeneity.
## 
## ------------------------------------------------------------------
##    Exposure Estimate Std Error  95% CI       p-value Cond F-stat
##  exposure_1    1.662     0.689  0.312, 3.012   0.016        19.6
##  exposure_2   -1.712     0.998 -3.667, 0.244   0.086         8.6
##  exposure_3    0.419     0.352 -0.272, 1.109   0.234        10.5
## ------------------------------------------------------------------
## 
## Overdispersion heterogeneity parameter estimate = 38.5163 
## 
## Heterogeneity test statistic = 15.6455

Note that this mr_mvpcgmm example does not use variants from a single gene region. It is provided to demonstrate that the code works, rather than to illustrate a recommended use case.

Multivariable inverse-variance weighted method with measurement error

The mr_mvivwme function performs multivariable Mendelian randomization via the inverse-variance method with measurement error, to account for measurement error in the genetic associations with the exposure traits. See Zhu, Jiazheng, Stephen Burgess, and Andrew J. Grant. Bias in Multivariable Mendelian Randomization Studies Due to Measurement Error on Exposures, 2022. DOI https://doi.org/10.48550/arXiv.2203.08668.

mr_mvivwme(mr_mvinput(bx = cbind(ldlc, hdlc, trig), bxse = cbind(ldlcse, hdlcse, trigse),
   by = chdlodds, byse = chdloddsse))

## 
## Multivariable inverse-variance weighted method accounting for measurement error
## (variants uncorrelated, random-effect model)
## 
## Number of Variants : 28 
## ------------------------------------------------------------------
##    Exposure Estimate Std Error  95% CI       p-value
##  exposure_1    2.025     0.504  1.038, 3.012   0.000
##  exposure_2   -0.561     0.676 -1.886, 0.765   0.407
##  exposure_3    0.738     0.281  0.188, 1.288   0.009
## ------------------------------------------------------------------
## Residual standard error =  1.436 
## Heterogeneity test statistic = 51.5230 on 25 degrees of freedom, (p-value = 0.0014)

Final note of caution

Particularly with the development of Mendelian randomization with summarized data, two-sample Mendelian randomization (where associations with the risk factor and with the outcome are taken from separate datasets), and bioinformatic tools for obtaining and analysing summarized data (including this package), Mendelian randomization is becoming increasingly accessible as a tool for answering epidemiological questions. This is undoubtably a Good Thing.

However, it is important to remember that the difficult part of a Mendelian randomization analysis is not the computational method, but deciding what should go into the analysis: which risk factor, which outcome, and (most importantly) which genetic variants. Hopefully, the availability of these tools will enable less attention to be paid to the mechanics of the analysis, and more attention to these choices.

The note of caution is that tools that make Mendelian randomization simple to perform run the risk of encouraging large numbers of speculative analyses to be performed in an unprincipled way. It is important that Mendelian randomization is not performed in a way that avoids critical thought. In releasing this package, the hope is that it will lead to more comprehensive and more reproducible causal inferences from Mendelian randomization, and not simply add more noise to the literature.