TY - JOUR
T1 - Calculation and implementation of sample-wide stochastic thresholds for forensic genetic analysis of STRs and SNPs for massively parallel sequencing platforms
AU - Stephens, Kathryn
AU - Snedecor, June
AU - Budowle, Bruce
N1 - Funding Information:
The authors would like to thank Juan Carlos Perez, Richelle Barta, Keenan Fleming, and Michaela Russo for preparing libraries and performing sequencing runs for this study and Joana Antunes for reviewing this manuscript. Kathryn Stephens and June Snedecor are current employees of Verogen, Inc. where the experiments were planned, performed and analyzed. Bruce Budowle is a consultant for Verogen, Inc.
Publisher Copyright:
© 2022 The Authors
PY - 2022/12
Y1 - 2022/12
N2 - Capillary electrophoresis (CE) analysis of short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) use a stochastic threshold to consider the possibility of missing alleles (dropouts) or detecting additional alleles (drop-ins). In CE, this threshold may be approximately 200 RFU, and peak heights are assessed relative to this threshold. In next generation sequencing (NGS), also known as massively parallel sequencing (MPS), STRs are identified by their sequence, and specific alleles are identified by their repeat number and intra-allelic variation. Abundance is approximated by the number of sequence reads for each allele. The total number of reads generated for each marker in a sample depends on factors such as the numbers of samples pooled for sequencing, the number of markers in the assay, the integrity and quantity of the input DNA sample, and the inter-locus balance of the assay. For multiplexes that contain both autosomal and sex-linked markers, the biological sex of the sample also influences total reads per locus. To normalize these variables and better establish a robust stochastic threshold, a sample-wide metric is proposed for estimating the possibility of dropouts or drop-ins based on the variance of the inter-locus balance of the markers across a sample. The intuition is that samples with variable allele balance globally are more likely to have noisier data and therefore require more stringent read count thresholds. This method is robust to sequencing multiplexity, biological sex and manufacturing lot variation.
AB - Capillary electrophoresis (CE) analysis of short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) use a stochastic threshold to consider the possibility of missing alleles (dropouts) or detecting additional alleles (drop-ins). In CE, this threshold may be approximately 200 RFU, and peak heights are assessed relative to this threshold. In next generation sequencing (NGS), also known as massively parallel sequencing (MPS), STRs are identified by their sequence, and specific alleles are identified by their repeat number and intra-allelic variation. Abundance is approximated by the number of sequence reads for each allele. The total number of reads generated for each marker in a sample depends on factors such as the numbers of samples pooled for sequencing, the number of markers in the assay, the integrity and quantity of the input DNA sample, and the inter-locus balance of the assay. For multiplexes that contain both autosomal and sex-linked markers, the biological sex of the sample also influences total reads per locus. To normalize these variables and better establish a robust stochastic threshold, a sample-wide metric is proposed for estimating the possibility of dropouts or drop-ins based on the variance of the inter-locus balance of the markers across a sample. The intuition is that samples with variable allele balance globally are more likely to have noisier data and therefore require more stringent read count thresholds. This method is robust to sequencing multiplexity, biological sex and manufacturing lot variation.
KW - Next generation sequencing
KW - STR
KW - Stochastic threshold
UR - http://www.scopus.com/inward/record.url?scp=85139277023&partnerID=8YFLogxK
U2 - 10.1016/j.fsigss.2022.09.032
DO - 10.1016/j.fsigss.2022.09.032
M3 - Article
AN - SCOPUS:85139277023
SN - 1875-1768
VL - 8
SP - 88
EP - 90
JO - Forensic Science International: Genetics Supplement Series
JF - Forensic Science International: Genetics Supplement Series
ER -