TY - JOUR
T1 - A novel phylogenetic approach for de novo discovery of putative nuclear mitochondrial (pNumt) haplotypes
AU - Smart, Utpal
AU - Budowle, Bruce
AU - Ambers, Angie
AU - Soares Moura-Neto, Rodrigo
AU - Silva, Rosane
AU - Woerner, August E.
N1 - Funding Information:
This work was supported in part by award no. 2017-DN-BX-0134, awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice. Rio de Janeiro samples (RSMN) analysis were supported by CAPES Pro-Forense grant n° 23038.006844/2014-46. The authors would like to thank the two anonymous reviewers for helping to significantly improve the manuscript. The authors also extend their thanks to Jonathan King, Jennifer Cihlar and Magdalena Bus of the UNTHSC Center of Human ID for their valuable input on the publication.
Funding Information:
This work was supported in part by award no. 2017-DN-BX-0134 , awarded by the National Institute of Justice , Office of Justice Programs, U.S. Department of Justice. Rio de Janeiro samples (RSMN) analysis were supported by CAPES Pro-Forense grant n° 23038.006844/2014-46 .
Publisher Copyright:
© 2019
PY - 2019/11
Y1 - 2019/11
N2 - Current approaches for parsing true variation (i.e. signal) from noise, broadly involve estimating a baseline value of the latter, below which all sequence data are ignored. In an effort to deliver a more objective criterion for setting such thresholds, a novel approach based on phylogenetic principles is presented here., Our method deconstructs a special category of noise from true mitochondrial genome data, namely nuclear insertions of mitochondrial DNA (Numts). This bioinformatic approach leverages the relationship of massively parallel sequence reads and is capable of discovering putative Numts (pNumts) in absence of a reference genome. The new method was tested on a whole mitochondrial genome dataset (n = 41 individuals from an admixed population sample from Rio de Janeiro) and led to the discovery of 451 pNumt variants. Comparison of these pNumts haplotypes against an existing Numt database revealed 147 exact matches to previously discovered Numts, while 122 haplotypes differed only by a single base pair and none matched exclusively to the mitochondrial genome. In general, these sequences were considerably more divergent from the mitochondrial genome than from those of the Numt database, supporting that the novel pNumts were probably hitherto uncatalogued variants. Unlike previous techniques, our method appears to be able to detect both polymorphic and fixed Numt sequences. It was also found that the region containing the D-Loop and associated Promoters (DLP) in the human mitochondrial genome, which harbors markers of forensic genetics importance, is the origin of several Numts. Though currently designed for the mitochondrial genome, our novel approach has the potential to be expanded to other scenarios that might require construing signal from noise, including the deconvolution of mixtures, thus significantly improving how analytical thresholds may be established.
AB - Current approaches for parsing true variation (i.e. signal) from noise, broadly involve estimating a baseline value of the latter, below which all sequence data are ignored. In an effort to deliver a more objective criterion for setting such thresholds, a novel approach based on phylogenetic principles is presented here., Our method deconstructs a special category of noise from true mitochondrial genome data, namely nuclear insertions of mitochondrial DNA (Numts). This bioinformatic approach leverages the relationship of massively parallel sequence reads and is capable of discovering putative Numts (pNumts) in absence of a reference genome. The new method was tested on a whole mitochondrial genome dataset (n = 41 individuals from an admixed population sample from Rio de Janeiro) and led to the discovery of 451 pNumt variants. Comparison of these pNumts haplotypes against an existing Numt database revealed 147 exact matches to previously discovered Numts, while 122 haplotypes differed only by a single base pair and none matched exclusively to the mitochondrial genome. In general, these sequences were considerably more divergent from the mitochondrial genome than from those of the Numt database, supporting that the novel pNumts were probably hitherto uncatalogued variants. Unlike previous techniques, our method appears to be able to detect both polymorphic and fixed Numt sequences. It was also found that the region containing the D-Loop and associated Promoters (DLP) in the human mitochondrial genome, which harbors markers of forensic genetics importance, is the origin of several Numts. Though currently designed for the mitochondrial genome, our novel approach has the potential to be expanded to other scenarios that might require construing signal from noise, including the deconvolution of mixtures, thus significantly improving how analytical thresholds may be established.
KW - Analytical threshold
KW - Bioinformatics
KW - Massively parallel sequencing
KW - Mitochondrial haplotypes
KW - PCR errors
KW - Phylogenetic networks
KW - Randomized minimum spanning trees
KW - pNumts
UR - http://www.scopus.com/inward/record.url?scp=85070975370&partnerID=8YFLogxK
U2 - 10.1016/j.fsigen.2019.102146
DO - 10.1016/j.fsigen.2019.102146
M3 - Article
C2 - 31446343
AN - SCOPUS:85070975370
SN - 1872-4973
VL - 43
JO - Forensic Science International: Genetics
JF - Forensic Science International: Genetics
M1 - 102146
ER -