The function pheno_input
extracts summarized data on associations with named exposure and outcome variables from PhenoScanner.
Arguments
- snps
The names (rsid) of the genetic variants to be included in the analysis.
- exposure
The name of the exposure variable.
- pmidE
The PubMed ID (PMID) of the publication in which the genetic association estimates with the exposure were originally reported. Some variables are reported in multiple consortia (for example, associations with coronary artery disease by CARDIoGRAM in 2011 [PMID:21378990], by CARDIoGRAMplusC4D in 2013, and again by CARDIoGRAMplusC4D in 2015 [PMID:26343387]). Equally, some publications reported associations on multiple variables (for example, CARDIoGRAMplusC4D in 2015 [PMID:26343387] reported associations with coronary artery disease and with myocardial infarction). By providing the variable name and the PubMed ID, the set of associations is (almost) uniquely identified.
- ancestryE
The ancestry of individuals in which estimates were obtained. A small number of studies reported genetic association estimates for a single variable in a single publication for multiple ethnicities (for example, associations with log(eGFR creatinine) from CKD-Gen in 2016 [PMID:26831199] were reported for both Europeans and Africans). The combination of exposure name, PubMed ID, and ancestry uniquely defines the set of associations. Providing the ancestry also reminds analysts of the additional complication of conducting Mendelian randomization when associations with the exposure and with the outcome are in individuals of different ancestry. Most association estimates are obtained in
"European"
or"Mixed"
populations, although some are obtained in"African"
,"Asian"
, or"Hispanic"
populations.- outcome
The name of the outcome variable.
- pmidO
The PubMed ID of the publication in which the genetic association estimates with the outcome were originally reported.
- ancestryO
The ancestry of individuals in which genetic association estimates with the outcome were obtained.
- correl
The correlations between the genetic variants. If this is not specified, then the genetic variants are assumed to be uncorrelated. Note that for the correlations to reference the correct variants, the list of genetic variants needs to be in alphabetical order.
Value
The output of the pheno_input
function is an MRInput
object that can be used directly in any of the estimation functions (such as mr_ivw
) or in the plotting function mr_plot
. The output contains:
- bx
The genetic associations with the exposure.
- bxse
The corresponding standard errors.
- by
The genetic associations with the outcome.
- byse
The corresponding standard errors.
- correlation
The matrix of genetic correlations as specified by the user.
- exposure
A character string giving the name of the exposure as provided in the PhenoScanner database.
- outcome
A character string giving the name of the outcome as provided in the PhenoScanner database.
- snps
A vector of character strings with the names of the genetic variants.
Details
The PhenoScanner bioinformatic tool is a curated database of publicly available results from large-scale genetic association studies. Queries can be made for individual genetic variants (SNPs and small indels), or for multiple variants in a single batch query. These association estimates and their standard errors can be used in Mendelian randomization analyses.
The phenoscanner
command is included in the MendelianRandomization
package with permission of James Staley. The function is also available in a standalone package from github: https://github.com/phenoscanner/phenoscanner.
References
James R Staley, James Blackshow, Mihir A Kamat, Steve Ellis, Prvaeen Surendran, Benjamin B Sun, Dirk S Paul, Daniel Freitag, Stephen Burgess, John Danesh, Robin Young, and Adam S Butterworth. PhenoScanner: a database of human genotype--phenotype associations. Bioinformatics 2016. doi: 10.1093/bioinformatics/btw373.