TY - JOUR
T1 - Accurate prediction of a minimal region around a genetic association signal that contains the causal variant
AU - Bochdanovits, Z.
AU - Simon Sanchez, J.
AU - Jonker, M.A.
AU - Hoogendijk, W.J.G.
AU - van der Vaart, A.W.
AU - Heutink, P.
PY - 2014
Y1 - 2014
N2 - In recent years, genome-wide association studies have been very successful in identifying loci for complex traits. However, typically these findings involve noncoding and/or intergenic SNPs without a clear functional effect that do not directly point to a gene. Hence, the challenge is to identify the causal variant responsible for the association signal. Typically, the first step is to identify all genetic variation in the locus region, usually by resequencing a large number of case chromosomes. Among all variants, the causal one needs to be identified in further functional studies. Because the experimental follow up can be very laborious, restricting the number of variants to be scrutinized can yield a great advantage. An objective method for choosing the size of the region to be followed up would be highly valuable. Here, we propose a simple method to call the minimal region around a significant association peak that is very likely to contain the causal variant. We model linkage disequilibrium (LD) in cases from the observed single SNP association signals, and predict the location of the causal variant by quantifying how well this relationship fits the data. Simulations showed that our approach identifies genomic regions of on average ∼50 kb with up to 90% probability to contain the causal variant. We apply our method to two genome-wide association data sets and localize both the functional variant REP1 in the synuclein gene that conveys susceptibility to Parkinson's disease and the APOE gene responsible for the association signal in the Alzheimer's disease data set. © 2014 Macmillan Publishers Limited All rights reserved.
AB - In recent years, genome-wide association studies have been very successful in identifying loci for complex traits. However, typically these findings involve noncoding and/or intergenic SNPs without a clear functional effect that do not directly point to a gene. Hence, the challenge is to identify the causal variant responsible for the association signal. Typically, the first step is to identify all genetic variation in the locus region, usually by resequencing a large number of case chromosomes. Among all variants, the causal one needs to be identified in further functional studies. Because the experimental follow up can be very laborious, restricting the number of variants to be scrutinized can yield a great advantage. An objective method for choosing the size of the region to be followed up would be highly valuable. Here, we propose a simple method to call the minimal region around a significant association peak that is very likely to contain the causal variant. We model linkage disequilibrium (LD) in cases from the observed single SNP association signals, and predict the location of the causal variant by quantifying how well this relationship fits the data. Simulations showed that our approach identifies genomic regions of on average ∼50 kb with up to 90% probability to contain the causal variant. We apply our method to two genome-wide association data sets and localize both the functional variant REP1 in the synuclein gene that conveys susceptibility to Parkinson's disease and the APOE gene responsible for the association signal in the Alzheimer's disease data set. © 2014 Macmillan Publishers Limited All rights reserved.
U2 - 10.1038/ejhg.2013.115
DO - 10.1038/ejhg.2013.115
M3 - Article
SN - 1018-4813
VL - 22
SP - 238
EP - 242
JO - European Journal of Human Genetics
JF - European Journal of Human Genetics
IS - 2
ER -