Genetic instrumental variable regression: Explaining socioeconomic and health outcomes in nonexperimental data

Thomas A. DiPrete*, Casper A.P. Burik, Philipp D. Koellinger

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Identifying causal effects in nonexperimental data is an enduring challenge. One proposed solution that recently gained popularity is the idea to use genes as instrumental variables [i.e., Mendelian randomization (MR)]. However, this approach is problematic because many variables of interest are genetically correlated, which implies the possibility that many genes could affect both the exposure and the outcome directly or via unobserved confounding factors. Thus, pleiotropic effects of genes are themselves a source of bias in nonexperimental data that would also undermine the ability of MR to correct for endogeneity bias from nongenetic sources. Here, we propose an alternative approach, genetic instrumental variable (GIV) regression, that provides estimates for the effect of an exposure on an outcome in the presence of pleiotropy. As a valuable byproduct, GIV regression also provides accurate estimates of the chip heritability of the outcome variable. GIV regression uses polygenic scores (PGSs) for the outcome of interest which can be constructed from genome-wide association study (GWAS) results. By splitting the GWAS sample for the outcome into nonoverlapping subsamples, we obtain multiple indicators of the outcome PGSs that can be used as instruments for each other and, in combination with other methods such as sibling fixed effects, can address endogeneity bias from both pleiotropy and the environment. In two empirical applications, we demonstrate that our approach produces reasonable estimates of the chip heritability of educational attainment (EA) and show that standard regression and MR provide upwardly biased estimates of the effect of body height on EA.

Original languageEnglish
Pages (from-to)E4970-E4979
Number of pages10
JournalProceedings of the National Academy of Sciences of the United States of America
Volume115
Issue number22
DOIs
Publication statusPublished - 29 May 2018

Funding

We are very grateful to Richard Karlsson Linnér for help with the GWAS analyses in the UK Biobank, to Aysu Okbay for providing us with subsets of the GWAS meta-analysis on educational attainment, and to S. Fleur Meddens for constructing genetic principal components in the HRS data. We thank Daniel J. Benjamin, Jonathan P. Beauchamp, Benjamin Domingue, Lisbeth Trille Loft, Niels Rietveld, Eric Slob, Patrick Turley, Hans van Kippersluis, and Tian Zheng for productive discussions and comments on earlier versions of the manuscript. Furthermore, we thank Elliot Tucker-Drob for directing us to the necessary correction of the heritability estimate in our model. This research was facilitated by the SSGAC and by the research group on genetic and social causes of life chances at the Zentrum für interdisciplinäre Forschung Bielefeld. Data analyses made use of the UK Biobank resource under Application 11425. We acknowledge data access from the GIANT Consortium. We used data from the HRS, which is supported by the National Institute on Aging (NIA U01AG009740, RC2 AG036495, and RC4 AG039029). This study was supported by funding from a European Research Council Consolidator Grant (647648 EdGe) (to P.D.K.). HRS genotype data can be accessed via the database of Genotypes and Phenotypes (dbGaP, accession no. phs000428.v1.p1). Researchers who wish to link genetic data with other HRS measures that are not in dbGaP, such as educational attainment, must apply for access from HRS. ACKNOWLEDGMENTS. We are very grateful to Richard Karlsson Linnér for help with the GWAS analyses in the UK Biobank, to Aysu Okbay for providing us with subsets of the GWAS meta-analysis on educational attainment, and to S. Fleur Meddens for constructing genetic principal components in the HRS data. We thank Daniel J. Benjamin, Jonathan P. Beauchamp, Benjamin Domingue, Lisbeth Trille Loft, Niels Rietveld, Eric Slob, Patrick Turley, Hans van Kippersluis, and Tian Zheng for productive discussions and comments on earlier versions of the manuscript. Furthermore, we thank Elliot Tucker-Drob for directing us to the necessary correction of the heritability estimate in our model. This research was facilitated by the SSGAC and by the research group on genetic and social causes of life chances at the Zentrum für interdisciplinäre Forschung Bielefeld. Data analyses made use of the UK Biobank resource under Application 11425. We acknowledge data access from the GIANT Consortium. We used data from the HRS, which is supported by the National Institute on Aging (NIA U01AG009740, RC2 AG036495, and RC4 AG039029). This study was supported by funding from a European Research Council Consolidator Grant (647648 EdGe) (to P.D.K.). HRS genotype data can be accessed via the database of Genotypes and Phenotypes (dbGaP, accession no. phs000428.v1.p1). Researchers who wish to link genetic data with other HRS measures that are not in dbGaP, such as educational attainment, must apply for access from HRS.

FundersFunder number
National Institute on AgingRC4 AG039029, U01AG009740, RC2 AG036495
Division of Antarctic Sciences
Scottish Salmon Growers Association
Washington Academy of Sciences
European Research Council647648 EdGe

    Keywords

    • Causal effects
    • Genetic instrumental variables
    • Genome-wide association studies
    • Pleiotropy
    • Polygenic scores

    Fingerprint

    Dive into the research topics of 'Genetic instrumental variable regression: Explaining socioeconomic and health outcomes in nonexperimental data'. Together they form a unique fingerprint.

    Cite this