Guidelines for Evaluating the Comparability of Down-Sampled GWAS Summary Statistics

Camille M. Williams*, Holly Poore, Peter T. Tanksley, Hyeokmoon Kweon, Natasia S. Courchesne-Krak, Diego Londono-Correa, Travis T. Mallard, Peter Barr, Philipp D. Koellinger, Irwin D. Waldman, Sandra Sanchez-Roige, K. Paige Harden, Abraham A. Palmer, Danielle M. Dick*, Richard Karlsson Linnér*

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Proprietary genetic datasets are valuable for boosting the statistical power of genome-wide association studies (GWASs), but their use can restrict investigators from publicly sharing the resulting summary statistics. Although researchers can resort to sharing down-sampled versions that exclude restricted data, down-sampling reduces power and might change the genetic etiology of the phenotype being studied. These problems are further complicated when using multivariate GWAS methods, such as genomic structural equation modeling (Genomic SEM), that model genetic correlations across multiple traits. Here, we propose a systematic approach to assess the comparability of GWAS summary statistics that include versus exclude restricted data. Illustrating this approach with a multivariate GWAS of an externalizing factor, we assessed the impact of down-sampling on (1) the strength of the genetic signal in univariate GWASs, (2) the factor loadings and model fit in multivariate Genomic SEM, (3) the strength of the genetic signal at the factor level, (4) insights from gene-property analyses, (5) the pattern of genetic correlations with other traits, and (6) polygenic score analyses in independent samples. For the externalizing GWAS, although down-sampling resulted in a loss of genetic signal and fewer genome-wide significant loci; the factor loadings and model fit, gene-property analyses, genetic correlations, and polygenic score analyses were found robust. Given the importance of data sharing for the advancement of open science, we recommend that investigators who generate and share down-sampled summary statistics report these analyses as accompanying documentation to support other researchers’ use of the summary statistics.

Original languageEnglish
Pages (from-to)404-415
Number of pages12
JournalBehavior Genetics
Volume53
Issue number5-6
DOIs
Publication statusPublished - Nov 2023

Bibliographical note

Funding Information:
This research was conducted by the Externalizing Consortium. The Externalizing Consortium has been supported by the National Institute on Alcohol Abuse and Alcoholism (R01AA015416 – administrative supplement to DMD), and the National Institute on Drug Abuse (R01DA050721 to DMD). Additional funding for investigator effort has been provided by K02AA018755, U10AA008401, P50AA022537 to DMD, R01AA029688, and 28IR-0070 to AAP and T29KT0526 and T32IR5226 to NCK and SSR from the Tobacco-Related Disease Research Program (TRDRP), NIDA DP1DA054394 to SSR, R25MH081482-16 to NCK, R01HD092548 to KPH, as well as a European Research Council Consolidator Grant (647648 EdGe) to PDK. The content is solely the responsibility of the authors and does not necessarily represent the official views of the above funding bodies. The Externalizing Consortium would like to thank the following groups for making the research possible: 23andMe, Add Health, Vanderbilt University Medical Center’s BioVU, Collaborative Study on the Genetics of Alcoholism (COGA), the Psychiatric Genomics Consortium’s Substance Use Disorders working group, UK10K Consortium, UK Biobank, and Philadelphia Neurodevelopmental Cohort.

Funding Information:
Tobacco-Related Disease Research Program, T29KT0526, T29KT0526, R01AA029688, K02AA018755, National Institute on Drug Abuse, R25MH081482-16, DP1DA054394, R01HD092548, R01DA050721, European Research Council Consolidator Grant, 647648 EdGe, National Institute on Alcohol Abuse and Alcoholism, R01AA015416

Funding Information:
This research was conducted by the Externalizing Consortium. The Externalizing Consortium has been supported by the National Institute on Alcohol Abuse and Alcoholism (R01AA015416 – administrative supplement to DMD), and the National Institute on Drug Abuse (R01DA050721 to DMD). Additional funding for investigator effort has been provided by K02AA018755, U10AA008401, P50AA022537 to DMD, R01AA029688, and 28IR-0070 to AAP and T29KT0526 and T32IR5226 to NCK and SSR from the Tobacco-Related Disease Research Program (TRDRP), NIDA DP1DA054394 to SSR, R25MH081482-16 to NCK, R01HD092548 to KPH, as well as a European Research Council Consolidator Grant (647648 EdGe) to PDK. The content is solely the responsibility of the authors and does not necessarily represent the official views of the above funding bodies. The Externalizing Consortium would like to thank the following groups for making the research possible: 23andMe, Add Health, Vanderbilt University Medical Center’s BioVU, Collaborative Study on the Genetics of Alcoholism (COGA), the Psychiatric Genomics Consortium’s Substance Use Disorders working group, UK10K Consortium, UK Biobank, and Philadelphia Neurodevelopmental Cohort.

Publisher Copyright:
© 2023, The Author(s).

Funding

This research was conducted by the Externalizing Consortium. The Externalizing Consortium has been supported by the National Institute on Alcohol Abuse and Alcoholism (R01AA015416 – administrative supplement to DMD), and the National Institute on Drug Abuse (R01DA050721 to DMD). Additional funding for investigator effort has been provided by K02AA018755, U10AA008401, P50AA022537 to DMD, R01AA029688, and 28IR-0070 to AAP and T29KT0526 and T32IR5226 to NCK and SSR from the Tobacco-Related Disease Research Program (TRDRP), NIDA DP1DA054394 to SSR, R25MH081482-16 to NCK, R01HD092548 to KPH, as well as a European Research Council Consolidator Grant (647648 EdGe) to PDK. The content is solely the responsibility of the authors and does not necessarily represent the official views of the above funding bodies. The Externalizing Consortium would like to thank the following groups for making the research possible: 23andMe, Add Health, Vanderbilt University Medical Center’s BioVU, Collaborative Study on the Genetics of Alcoholism (COGA), the Psychiatric Genomics Consortium’s Substance Use Disorders working group, UK10K Consortium, UK Biobank, and Philadelphia Neurodevelopmental Cohort. Tobacco-Related Disease Research Program, T29KT0526, T29KT0526, R01AA029688, K02AA018755, National Institute on Drug Abuse, R25MH081482-16, DP1DA054394, R01HD092548, R01DA050721, European Research Council Consolidator Grant, 647648 EdGe, National Institute on Alcohol Abuse and Alcoholism, R01AA015416 This research was conducted by the Externalizing Consortium. The Externalizing Consortium has been supported by the National Institute on Alcohol Abuse and Alcoholism (R01AA015416 – administrative supplement to DMD), and the National Institute on Drug Abuse (R01DA050721 to DMD). Additional funding for investigator effort has been provided by K02AA018755, U10AA008401, P50AA022537 to DMD, R01AA029688, and 28IR-0070 to AAP and T29KT0526 and T32IR5226 to NCK and SSR from the Tobacco-Related Disease Research Program (TRDRP), NIDA DP1DA054394 to SSR, R25MH081482-16 to NCK, R01HD092548 to KPH, as well as a European Research Council Consolidator Grant (647648 EdGe) to PDK. The content is solely the responsibility of the authors and does not necessarily represent the official views of the above funding bodies. The Externalizing Consortium would like to thank the following groups for making the research possible: 23andMe, Add Health, Vanderbilt University Medical Center’s BioVU, Collaborative Study on the Genetics of Alcoholism (COGA), the Psychiatric Genomics Consortium’s Substance Use Disorders working group, UK10K Consortium, UK Biobank, and Philadelphia Neurodevelopmental Cohort.

FundersFunder number
COGA
Genetics of Alcoholism
National Institute on Drug AbuseR01HD092548, 28IR-0070, R01DA050721, R25MH081482-16, K02AA018755, P50AA022537, R01AA029688, T29KT0526, U10AA008401, T32IR5226, DP1DA054394
National Institute on Drug Abuse
National Institute on Alcohol Abuse and AlcoholismR01AA015416
National Institute on Alcohol Abuse and Alcoholism
Tobacco-Related Disease Research Program
Vanderbilt University Medical Center
European Research Council647648 EdGe
European Research Council

    Keywords

    • Data removal
    • Down-sample
    • Genome-wide association study
    • Genomic SEM
    • Genomics
    • Leave-one-out
    • Meta-analysis
    • Summary statistics

    Fingerprint

    Dive into the research topics of 'Guidelines for Evaluating the Comparability of Down-Sampled GWAS Summary Statistics'. Together they form a unique fingerprint.

    Cite this