Compound repeats provide a testing ground for evaluating hypotheses on the causes and consequences of stutter. As compound repeats have two repeating motifs, each motif may produce stutter variants; thus as the repeat within the locus is compound, so is the resultant stutter. Further, the rates of stutter formation between these motifs may not be independent. This lack of independence may complicate modeling strategies, thus contributing to the challenges of mixture interpretation that rely on nucleotide sequence. This study evaluates compound stutter in two STR loci: D2S1338 and D12S391. The effects of flanking variation, as well as possible interactions between the two different uninterrupted stretches (US) and their respective stutter variants are assessed. Multivariate multiple linear regression (MMLR) was used to show that, as with simple repeats, the rate of stutter product formation of a particular repeating motif is not solely a function of the US of that repeat. The nucleotides adjacent to the repeating motif also appear to influence the rate of stutter formation of that motif, with those nucleotides sometimes including the other motif. MMLR was used to estimate the size of these effects and to construct an example of a two-dimensional (thus, a compound) stutter prediction. This example may merit further investigation in the application of massively parallel sequencing data to mixture interpretation and probabilistic genotyping.
- Flanking variation
- Massively parallel sequencing
- Probabilistic genotyping