Multivariable principal components generalized method of moments (PC-GMM) method

The mr_mvpcgmm function performs multivariable Mendelian randomization via the principal components generalized method of moments method.

Usage

mr_mvpcgmm(
  object,
  nx,
  ny,
  cor.x = NULL,
  r = NULL,
  thres = 0.99,
  robust = TRUE,
  alpha = 0.05,
  ...
)

# S4 method for MRMVInput
mr_mvpcgmm(
  object,
  nx,
  ny,
  cor.x = NULL,
  r = NULL,
  thres = 0.99,
  robust = TRUE,
  alpha = 0.05,
  ...
)

Arguments

object: An MRMVInput object.
nx: Vector of sample sizes used to compute genetic associations with the exposure (one for each exposure).
ny: The sample size used to compute genetic associations with the outcome.
cor.x: Correlation matrix for exposures. Default is to assume the exposures are uncorrelated.
r: The number of genetic principal components to be used to instrument the exposures. Default chooses r to explain 99.9% of variation in a sample weighted genetic correlation matrix (this can be varied by setting the thres parameter).
thres: The threshold value of variation in the sample weighted genetic correlation matrix explained by the genetic principal components. The default value is 0.99, indicating that 99% of variation is explained by the principal components. Note that if r and thres are both specified, then r will take precedence and thres will be ignored.
robust: Indicates whether overdispersion heterogeneity is accounted for in the model. Default is TRUE.
alpha: The significance level used to calculate the confidence interval. The default value is 0.05.
...: Additional arguments to be passed to the optimization routines to calculate GMM estimates and overdispersion parameter.

Value

The output from the function is an MVPCGMM object containing:

Robust: TRUE if overdispersion heterogeneity was included in the model, FALSE otherwise.
Exposure: A character vector with the names given to the exposure.
Outcome: A character string with the names given to the outcome.
Correlation: The matrix of genetic correlations.
ExpCorrelation: TRUE if an exposure correlation matrix was specified, FALSE otherwise.
CondFstat: A vector of conditional F-statistics (one for each exposure).
Estimate: A vector of causal estimates.
StdError: A vector of standard errors of the causal estimates.
CILower: The lower bounds of the causal estimates based on the estimated standard errors and the significance level provided.
CIUpper: The upper bounds of the causal estimates based on the estimated standard errors and the significance level provided.
Overdispersion: The estimate of the overdispersion parameter.
PCs: The number of genetic principal components used to instrument the exposures.
Pvalue: The p-values associated with the estimates (calculated as Estimate/StdError as per Wald test) using a normal distribution.
Alpha: The significance level used when calculating the confidence intervals.
Heter.Stat: Heterogeneity statistic (Cochran's Q statistic) and associated p-value (for non-robust model only): the null hypothesis is that all genetic principal components estimate the same causal parameter; rejection of the null is an indication that one or more principal components may be pleiotropic.

Details

When a Mendelian randomization analysis is performed using correlated genetic variants from a single gene region, there is a tradeoff between using too few variants (and compromising on power) and using too many variants (in which case, estimates can be highly sensitive to small variation in the correlation matrix). This method performs dimension reduction on a weighted version of the genetic correlation matrix to form principal components based on the genetic variants, which are then used as instruments. It is recommended not to include very highly correlated variants in this method (say, r^2 > 0.95), but the method should cope well with variants correlated below this level.

This function runs a multivariable version of the PC-GMM method, which can be used when there are distinct exposures associated with variants at a single gene region. Phenotypic heterogeneity (that is, the genetic associations with the exposures are not collinear) at genomic loci encoding drug targets can be exploited by multivariable Mendelian randomization to provide insight on the pathways by which pharmacological interventions may affect disease risk.

This method provides two-sample multivariable Mendelian randomization estimates and associated confidence intervals that account for overdispersion heterogeneity in dimension-reduced genetic associations (when robust = TRUE).

References

Description of multivariable Mendelian randomization: Stephen Burgess, Simon G Thompson. Multivariable Mendelian Randomization: the use of pleiotropic genetic variants to estimate causal effects. American Journal of Epidemiology 2015; 181(4):251-260. doi: 10.1093/aje/kwu283.

Description of the PC-GMM method: "Robust use of phenotypic heterogeneity at drug target genes for mechanistic insights: application of cis-multivariable Mendelian randomization to GLP1R gene region" (Preprint).

Examples

mr_mvpcgmm(mr_mvinput(bx = cbind(ldlc, hdlc, trig), bxse = cbind(ldlcse, hdlcse, trigse),
   by = chdlodds, byse = chdloddsse, correlation = diag(length(ldlc))), nx=rep(17723,3), ny=17723)
#> 
#> Multivariable principal components generalized method of moments (PC-GMM) method
#> 
#> Exposure correlation matrix not specified. Exposures are assumed to be uncorrelated.
#> 
#> Number of principal components used : 20 
#> 
#> Robust model with overdispersion heterogeneity.
#> 
#> ------------------------------------------------------------------
#>    Exposure Estimate Std Error  95% CI       p-value Cond F-stat
#>  exposure_1    1.662     0.689  0.312, 3.012   0.016        19.6
#>  exposure_2   -1.712     0.998 -3.667, 0.244   0.086         8.6
#>  exposure_3    0.419     0.352 -0.272, 1.109   0.234        10.5
#> ------------------------------------------------------------------
#> 
#> Overdispersion heterogeneity parameter estimate = 38.5163 
#> 
#> Heterogeneity test statistic = 15.6455
# Note this example does not use variants from a single gene region, and is provided
#  to demonstrate that the code works, rather than to illustrate a recommended use case.