TY - JOUR
T1 - Using unique molecular identifiers to improve allele calling in low-template mixtures
AU - Crysup, Benjamin
AU - Mandape, Sammed
AU - King, Jonathan L.
AU - Muenzler, Melissa
AU - Kapema, Kapema Bupe
AU - Woerner, August E.
N1 - Funding Information:
This work was supported in part by award no. 2018-DU-BX-0177 , awarded by the National Institute of Justice, Office of Justice Programs , U.S. Department of Justice. The opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect those of the U.S. Department of Justice.
Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2023/3
Y1 - 2023/3
N2 - PCR artifacts are an ever-present challenge in sequencing applications. These artifacts can seriously limit the analysis and interpretation of low-template samples and mixtures, especially with respect to a minor contributor. In medicine, molecular barcoding techniques have been employed to decrease the impact of PCR error and to allow the examination of low-abundance somatic variation. In principle, it should be possible to apply the same techniques to the forensic analysis of mixtures. To that end, several short tandem repeat loci were selected for targeted sequencing, and a bioinformatic pipeline for analyzing the sequence data was developed. The pipeline notes the relevant unique molecular identifiers (UMIs) attached to each read and, using machine learning, filters the noise products out of the set of potential alleles. To evaluate this pipeline, DNA from pairs of individuals were mixed at different ratios (1−1, 1−9) and sequenced with different starting amounts of DNA (10, 1 and 0.1 ng). Naïvely using the information in the molecular barcodes led to increased performance, with the machine learning resulting in an additional benefit. In concrete terms, using the UMI data results in less noise for a given amount of drop out. For instance, if thresholds are selected that filter out a quarter of the true alleles, using read counts accepts 2381 noise alleles and using raw UMI counts accepts 1726 noise alleles, while the machine learning approach only accepts 307.
AB - PCR artifacts are an ever-present challenge in sequencing applications. These artifacts can seriously limit the analysis and interpretation of low-template samples and mixtures, especially with respect to a minor contributor. In medicine, molecular barcoding techniques have been employed to decrease the impact of PCR error and to allow the examination of low-abundance somatic variation. In principle, it should be possible to apply the same techniques to the forensic analysis of mixtures. To that end, several short tandem repeat loci were selected for targeted sequencing, and a bioinformatic pipeline for analyzing the sequence data was developed. The pipeline notes the relevant unique molecular identifiers (UMIs) attached to each read and, using machine learning, filters the noise products out of the set of potential alleles. To evaluate this pipeline, DNA from pairs of individuals were mixed at different ratios (1−1, 1−9) and sequenced with different starting amounts of DNA (10, 1 and 0.1 ng). Naïvely using the information in the molecular barcodes led to increased performance, with the machine learning resulting in an additional benefit. In concrete terms, using the UMI data results in less noise for a given amount of drop out. For instance, if thresholds are selected that filter out a quarter of the true alleles, using read counts accepts 2381 noise alleles and using raw UMI counts accepts 1726 noise alleles, while the machine learning approach only accepts 307.
KW - DNA Mixtures
KW - Machine Learning
KW - Massively Parallel Sequencing
KW - Molecular Barcodes
KW - Stutter
UR - http://www.scopus.com/inward/record.url?scp=85145673669&partnerID=8YFLogxK
U2 - 10.1016/j.fsigen.2022.102807
DO - 10.1016/j.fsigen.2022.102807
M3 - Article
C2 - 36462297
AN - SCOPUS:85145673669
SN - 1872-4973
VL - 63
JO - Forensic Science International: Genetics
JF - Forensic Science International: Genetics
M1 - 102807
ER -