Abstract
Proprietary genetic datasets are valuable for boosting the statistical power of genome-wide association studies (GWASs), but their use can restrict investigators from publicly sharing the resulting summary statistics. Although researchers can resort to sharing down-sampled versions that exclude restricted data, down-sampling reduces power and might change the genetic etiology of the phenotype being studied. These problems are further complicated when using multivariate GWAS methods, such as genomic structural equation modeling (Genomic SEM), that model genetic correlations across multiple traits. Here, we propose a systematic approach to assess the comparability of GWAS summary statistics that include versus exclude restricted data. Illustrating this approach with a multivariate GWAS of an externalizing factor, we assessed the impact of down-sampling on (1) the strength of the genetic signal in univariate GWASs, (2) the factor loadings and model fit in multivariate Genomic SEM, (3) the strength of the genetic signal at the factor level, (4) insights from gene-property analyses, (5) the pattern of genetic correlations with other traits, and (6) polygenic score analyses in independent samples. For the externalizing GWAS, although down-sampling resulted in a loss of genetic signal and fewer genome-wide significant loci; the factor loadings and model fit, gene-property analyses, genetic correlations, and polygenic score analyses were found robust. Given the importance of data sharing for the advancement of open science, we recommend that investigators who generate and share down-sampled summary statistics report these analyses as accompanying documentation to support other researchers’ use of the summary statistics.
Original language | English |
---|---|
Pages (from-to) | 404-415 |
Number of pages | 12 |
Journal | Behavior Genetics |
Volume | 53 |
Issue number | 5-6 |
DOIs | |
Publication status | Published - Nov 2023 |
Bibliographical note
Funding Information:This research was conducted by the Externalizing Consortium. The Externalizing Consortium has been supported by the National Institute on Alcohol Abuse and Alcoholism (R01AA015416 – administrative supplement to DMD), and the National Institute on Drug Abuse (R01DA050721 to DMD). Additional funding for investigator effort has been provided by K02AA018755, U10AA008401, P50AA022537 to DMD, R01AA029688, and 28IR-0070 to AAP and T29KT0526 and T32IR5226 to NCK and SSR from the Tobacco-Related Disease Research Program (TRDRP), NIDA DP1DA054394 to SSR, R25MH081482-16 to NCK, R01HD092548 to KPH, as well as a European Research Council Consolidator Grant (647648 EdGe) to PDK. The content is solely the responsibility of the authors and does not necessarily represent the official views of the above funding bodies. The Externalizing Consortium would like to thank the following groups for making the research possible: 23andMe, Add Health, Vanderbilt University Medical Center’s BioVU, Collaborative Study on the Genetics of Alcoholism (COGA), the Psychiatric Genomics Consortium’s Substance Use Disorders working group, UK10K Consortium, UK Biobank, and Philadelphia Neurodevelopmental Cohort.
Funding Information:
Tobacco-Related Disease Research Program, T29KT0526, T29KT0526, R01AA029688, K02AA018755, National Institute on Drug Abuse, R25MH081482-16, DP1DA054394, R01HD092548, R01DA050721, European Research Council Consolidator Grant, 647648 EdGe, National Institute on Alcohol Abuse and Alcoholism, R01AA015416
Funding Information:
This research was conducted by the Externalizing Consortium. The Externalizing Consortium has been supported by the National Institute on Alcohol Abuse and Alcoholism (R01AA015416 – administrative supplement to DMD), and the National Institute on Drug Abuse (R01DA050721 to DMD). Additional funding for investigator effort has been provided by K02AA018755, U10AA008401, P50AA022537 to DMD, R01AA029688, and 28IR-0070 to AAP and T29KT0526 and T32IR5226 to NCK and SSR from the Tobacco-Related Disease Research Program (TRDRP), NIDA DP1DA054394 to SSR, R25MH081482-16 to NCK, R01HD092548 to KPH, as well as a European Research Council Consolidator Grant (647648 EdGe) to PDK. The content is solely the responsibility of the authors and does not necessarily represent the official views of the above funding bodies. The Externalizing Consortium would like to thank the following groups for making the research possible: 23andMe, Add Health, Vanderbilt University Medical Center’s BioVU, Collaborative Study on the Genetics of Alcoholism (COGA), the Psychiatric Genomics Consortium’s Substance Use Disorders working group, UK10K Consortium, UK Biobank, and Philadelphia Neurodevelopmental Cohort.
Publisher Copyright:
© 2023, The Author(s).
Funding
This research was conducted by the Externalizing Consortium. The Externalizing Consortium has been supported by the National Institute on Alcohol Abuse and Alcoholism (R01AA015416 – administrative supplement to DMD), and the National Institute on Drug Abuse (R01DA050721 to DMD). Additional funding for investigator effort has been provided by K02AA018755, U10AA008401, P50AA022537 to DMD, R01AA029688, and 28IR-0070 to AAP and T29KT0526 and T32IR5226 to NCK and SSR from the Tobacco-Related Disease Research Program (TRDRP), NIDA DP1DA054394 to SSR, R25MH081482-16 to NCK, R01HD092548 to KPH, as well as a European Research Council Consolidator Grant (647648 EdGe) to PDK. The content is solely the responsibility of the authors and does not necessarily represent the official views of the above funding bodies. The Externalizing Consortium would like to thank the following groups for making the research possible: 23andMe, Add Health, Vanderbilt University Medical Center’s BioVU, Collaborative Study on the Genetics of Alcoholism (COGA), the Psychiatric Genomics Consortium’s Substance Use Disorders working group, UK10K Consortium, UK Biobank, and Philadelphia Neurodevelopmental Cohort. Tobacco-Related Disease Research Program, T29KT0526, T29KT0526, R01AA029688, K02AA018755, National Institute on Drug Abuse, R25MH081482-16, DP1DA054394, R01HD092548, R01DA050721, European Research Council Consolidator Grant, 647648 EdGe, National Institute on Alcohol Abuse and Alcoholism, R01AA015416 This research was conducted by the Externalizing Consortium. The Externalizing Consortium has been supported by the National Institute on Alcohol Abuse and Alcoholism (R01AA015416 – administrative supplement to DMD), and the National Institute on Drug Abuse (R01DA050721 to DMD). Additional funding for investigator effort has been provided by K02AA018755, U10AA008401, P50AA022537 to DMD, R01AA029688, and 28IR-0070 to AAP and T29KT0526 and T32IR5226 to NCK and SSR from the Tobacco-Related Disease Research Program (TRDRP), NIDA DP1DA054394 to SSR, R25MH081482-16 to NCK, R01HD092548 to KPH, as well as a European Research Council Consolidator Grant (647648 EdGe) to PDK. The content is solely the responsibility of the authors and does not necessarily represent the official views of the above funding bodies. The Externalizing Consortium would like to thank the following groups for making the research possible: 23andMe, Add Health, Vanderbilt University Medical Center’s BioVU, Collaborative Study on the Genetics of Alcoholism (COGA), the Psychiatric Genomics Consortium’s Substance Use Disorders working group, UK10K Consortium, UK Biobank, and Philadelphia Neurodevelopmental Cohort.
Funders | Funder number |
---|---|
COGA | |
Genetics of Alcoholism | |
National Institute on Drug Abuse | R01HD092548, 28IR-0070, R01DA050721, R25MH081482-16, K02AA018755, P50AA022537, R01AA029688, T29KT0526, U10AA008401, T32IR5226, DP1DA054394 |
National Institute on Drug Abuse | |
National Institute on Alcohol Abuse and Alcoholism | R01AA015416 |
National Institute on Alcohol Abuse and Alcoholism | |
Tobacco-Related Disease Research Program | |
Vanderbilt University Medical Center | |
European Research Council | 647648 EdGe |
European Research Council |
Keywords
- Data removal
- Down-sample
- Genome-wide association study
- Genomic SEM
- Genomics
- Leave-one-out
- Meta-analysis
- Summary statistics