Multivariable principal components generalized method of moments (PC-GMM) method
Source:R/AllGenerics.R
, R/mr_mvpcgmm-methods.R
mr_mvpcgmm.Rd
The mr_mvpcgmm
function performs multivariable Mendelian randomization via the principal components generalized method of moments method.
Usage
mr_mvpcgmm(
object,
nx,
ny,
cor.x = NULL,
r = NULL,
thres = 0.99,
robust = TRUE,
alpha = 0.05,
...
)
# S4 method for MRMVInput
mr_mvpcgmm(
object,
nx,
ny,
cor.x = NULL,
r = NULL,
thres = 0.99,
robust = TRUE,
alpha = 0.05,
...
)
Arguments
- object
An
MRMVInput
object.- nx
Vector of sample sizes used to compute genetic associations with the exposure (one for each exposure).
- ny
The sample size used to compute genetic associations with the outcome.
- cor.x
Correlation matrix for exposures. Default is to assume the exposures are uncorrelated.
- r
The number of genetic principal components to be used to instrument the exposures. Default chooses
r
to explain 99.9% of variation in a sample weighted genetic correlation matrix (this can be varied by setting thethres
parameter).- thres
The threshold value of variation in the sample weighted genetic correlation matrix explained by the genetic principal components. The default value is 0.99, indicating that 99% of variation is explained by the principal components. Note that if
r
andthres
are both specified, thenr
will take precedence andthres
will be ignored.- robust
Indicates whether overdispersion heterogeneity is accounted for in the model. Default is TRUE.
- alpha
The significance level used to calculate the confidence interval. The default value is 0.05.
- ...
Additional arguments to be passed to the optimization routines to calculate GMM estimates and overdispersion parameter.
Value
The output from the function is an MVPCGMM
object containing:
- Robust
TRUE
if overdispersion heterogeneity was included in the model,FALSE
otherwise.- Exposure
A character vector with the names given to the exposure.
- Outcome
A character string with the names given to the outcome.
- Correlation
The matrix of genetic correlations.
- ExpCorrelation
TRUE
if an exposure correlation matrix was specified,FALSE
otherwise.- CondFstat
A vector of conditional F-statistics (one for each exposure).
- Estimate
A vector of causal estimates.
- StdError
A vector of standard errors of the causal estimates.
- CILower
The lower bounds of the causal estimates based on the estimated standard errors and the significance level provided.
- CIUpper
The upper bounds of the causal estimates based on the estimated standard errors and the significance level provided.
- Overdispersion
The estimate of the overdispersion parameter.
- PCs
The number of genetic principal components used to instrument the exposures.
- Pvalue
The p-values associated with the estimates (calculated as Estimate/StdError as per Wald test) using a normal distribution.
- Alpha
The significance level used when calculating the confidence intervals.
- Heter.Stat
Heterogeneity statistic (Cochran's Q statistic) and associated p-value (for non-robust model only): the null hypothesis is that all genetic principal components estimate the same causal parameter; rejection of the null is an indication that one or more principal components may be pleiotropic.
Details
When a Mendelian randomization analysis is performed using correlated genetic variants from a single gene region, there is a tradeoff between using too few variants (and compromising on power) and using too many variants (in which case, estimates can be highly sensitive to small variation in the correlation matrix). This method performs dimension reduction on a weighted version of the genetic correlation matrix to form principal components based on the genetic variants, which are then used as instruments. It is recommended not to include very highly correlated variants in this method (say, r^2 > 0.95
), but the method should cope well with variants correlated below this level.
This function runs a multivariable version of the PC-GMM method, which can be used when there are distinct exposures associated with variants at a single gene region. Phenotypic heterogeneity (that is, the genetic associations with the exposures are not collinear) at genomic loci encoding drug targets can be exploited by multivariable Mendelian randomization to provide insight on the pathways by which pharmacological interventions may affect disease risk.
This method provides two-sample multivariable Mendelian randomization estimates and associated confidence intervals that account for overdispersion heterogeneity in dimension-reduced genetic associations (when robust = TRUE
).
References
Description of multivariable Mendelian randomization: Stephen Burgess, Simon G Thompson. Multivariable Mendelian Randomization: the use of pleiotropic genetic variants to estimate causal effects. American Journal of Epidemiology 2015; 181(4):251-260. doi: 10.1093/aje/kwu283.
Description of the PC-GMM method: "Robust use of phenotypic heterogeneity at drug target genes for mechanistic insights: application of cis-multivariable Mendelian randomization to GLP1R gene region" (Preprint).
Examples
mr_mvpcgmm(mr_mvinput(bx = cbind(ldlc, hdlc, trig), bxse = cbind(ldlcse, hdlcse, trigse),
by = chdlodds, byse = chdloddsse, correlation = diag(length(ldlc))), nx=rep(17723,3), ny=17723)
#>
#> Multivariable principal components generalized method of moments (PC-GMM) method
#>
#> Exposure correlation matrix not specified. Exposures are assumed to be uncorrelated.
#>
#> Number of principal components used : 20
#>
#> Robust model with overdispersion heterogeneity.
#>
#> ------------------------------------------------------------------
#> Exposure Estimate Std Error 95% CI p-value Cond F-stat
#> exposure_1 1.662 0.689 0.312, 3.012 0.016 19.6
#> exposure_2 -1.712 0.998 -3.667, 0.244 0.086 8.6
#> exposure_3 0.419 0.352 -0.272, 1.109 0.234 10.5
#> ------------------------------------------------------------------
#>
#> Overdispersion heterogeneity parameter estimate = 38.5163
#>
#> Heterogeneity test statistic = 15.6455
# Note this example does not use variants from a single gene region, and is provided
# to demonstrate that the code works, rather than to illustrate a recommended use case.