TY - JOUR

T1 - Measures of variation at DNA repeat loci under a general stepwise mutation model

AU - Kimmel, Marek

AU - Chakraborty, Ranajit

N1 - Funding Information:
We thank Professor Olle Nerman of the University of Gotheborg for providing insights underlying some of the derivations. This work was supported by Grants GM 41399 (R.C.) and GM 58545 (R.C., M.K.) from the National Institutes of Health, and DMS 9203436 and DMS 9409909 (M.K.) from the National Science Foundation and by the Keck’s Center for Computational Biology at the Rice University (M.K.). Part of this work was carried out during M.K.’s visit at the University of Gotheborg in September 1995.

PY - 1996/12

Y1 - 1996/12

N2 - Polymorphisms at tandem repeat loci are caused by mutations with allele sizes occasionally altered by more than one repeat unit in both forward and backward directions. Such mutational changes may occur with asymmetric probabilities. Therefore, a one-step symmetric stepwise mutation model may not be appropriate for studying the population dynamics at all repeat loci. In this work, we evaluated the expectation and variance of the within- population variance of the allele size distribution in a finite population, and the expected homozygosity at a locus by the coalescence approach under a general stepwise mutation model, where mutational transitions of allele sizes can be arbitrary, including being asymmetric. Under the special cases of symmetric one-step, two-step, and multi-step geometric distributions of mutations, our general results reduce to the corresponding results obtained by earlier investigators. The general results indicate that in a finite population, which has reached a steady state under the (general stepwise) mutation and drift balance, the within-population variance of allele sizes has a simple expectation (i.e., proportional to Nv. the product of the mutation rate, v, and effective population size, N). However, its stochastic variance is a quadratic function of this composite parameter, Nv. Furthermore, this second-order variance does not decay with the number of alleles sampled from a population. Application of this theory to data on allele size distributions in unrelated Caucasians from the CEPH pedigree (obtained from the Genome Data Base) shows that the relationship of the variance and mean of within-population variance of allele size at tandem repeat loci, grouped by their chromosomal assignment, has a trend compatible with the theory. However, there is an indication that the second-order variance is generally underestimated. One reason for this departure might be that the CEPH sample may not represent a single homogeneous population that reached equilibrium at all tandem repeat loci.

AB - Polymorphisms at tandem repeat loci are caused by mutations with allele sizes occasionally altered by more than one repeat unit in both forward and backward directions. Such mutational changes may occur with asymmetric probabilities. Therefore, a one-step symmetric stepwise mutation model may not be appropriate for studying the population dynamics at all repeat loci. In this work, we evaluated the expectation and variance of the within- population variance of the allele size distribution in a finite population, and the expected homozygosity at a locus by the coalescence approach under a general stepwise mutation model, where mutational transitions of allele sizes can be arbitrary, including being asymmetric. Under the special cases of symmetric one-step, two-step, and multi-step geometric distributions of mutations, our general results reduce to the corresponding results obtained by earlier investigators. The general results indicate that in a finite population, which has reached a steady state under the (general stepwise) mutation and drift balance, the within-population variance of allele sizes has a simple expectation (i.e., proportional to Nv. the product of the mutation rate, v, and effective population size, N). However, its stochastic variance is a quadratic function of this composite parameter, Nv. Furthermore, this second-order variance does not decay with the number of alleles sampled from a population. Application of this theory to data on allele size distributions in unrelated Caucasians from the CEPH pedigree (obtained from the Genome Data Base) shows that the relationship of the variance and mean of within-population variance of allele size at tandem repeat loci, grouped by their chromosomal assignment, has a trend compatible with the theory. However, there is an indication that the second-order variance is generally underestimated. One reason for this departure might be that the CEPH sample may not represent a single homogeneous population that reached equilibrium at all tandem repeat loci.

UR - http://www.scopus.com/inward/record.url?scp=0030447670&partnerID=8YFLogxK

U2 - 10.1006/tpbi.1996.0035

DO - 10.1006/tpbi.1996.0035

M3 - Article

C2 - 9000494

AN - SCOPUS:0030447670

VL - 50

SP - 345

EP - 367

JO - Theoretical Population Biology

JF - Theoretical Population Biology

SN - 0040-5809

IS - 3

ER -