The genetic dissection of quantitative traits, or endophenotypes, usually involves genetic linkage or association analysis in pedigrees and subsequent fine mapping association analysis in the population. The ascertainment procedure for quantitative traits often results in unequal variance of observations. For example, some phenotypes may be clinically measured whilst others are from self-reports, or phenotypes may be the average of multiple measures but with the number of measurements varying. The resulting heterogeneity of variance poses no real problem for analysis, as long as it is properly modelled and thereby taken into account. However, if statistical significance is determined using an empirical permutation procedure, it is not obvious what the units of sampling are. We investigated a number of permutation approaches in a simulation study of an association analysis between a quantitative trait and a single nucleotide polymorphism. Our simulations were designed such that we knew the true p-value of the test statistics. A number of permutation methods were compared from the regression of true on empirical p-values and the precision of the empirical p-values. We show that the best procedure involves an implicit adjustment of the original data for the effects in the model before permutation, and that other methods, some of which seemed appropriate a priori, are relatively biased. © 2009 Wiley-Liss, Inc.