A novel phylogenetic approach for de novo discovery of putative nuclear mitochondrial (pNumt) haplotypes

Utpal Smart, Bruce Budowle, Angie Ambers, Rodrigo Soares Moura-Neto, Rosane Silva, August Eric Woerner

Research output: Contribution to journalArticle

Abstract

Current approaches for parsing true variation (i.e. signal) from noise, broadly involve estimating a baseline value of the latter, below which all sequence data are ignored. In an effort to deliver a more objective criterion for setting such thresholds, a novel approach based on phylogenetic principles is presented here., Our method deconstructs a special category of noise from true mitochondrial genome data, namely nuclear insertions of mitochondrial DNA (Numts). This bioinformatic approach leverages the relationship of massively parallel sequence reads and is capable of discovering putative Numts (pNumts) in absence of a reference genome. The new method was tested on a whole mitochondrial genome dataset (n = 41 individuals from an admixed population sample from Rio de Janeiro) and led to the discovery of 451 pNumt variants. Comparison of these pNumts haplotypes against an existing Numt database revealed 147 exact matches to previously discovered Numts, while 122 haplotypes differed only by a single base pair and none matched exclusively to the mitochondrial genome. In general, these sequences were considerably more divergent from the mitochondrial genome than from those of the Numt database, supporting that the novel pNumts were probably hitherto uncatalogued variants. Unlike previous techniques, our method appears to be able to detect both polymorphic and fixed Numt sequences. It was also found that the region containing the D-Loop and associated Promoters (DLP) in the human mitochondrial genome, which harbors markers of forensic genetics importance, is the origin of several Numts. Though currently designed for the mitochondrial genome, our novel approach has the potential to be expanded to other scenarios that might require construing signal from noise, including the deconvolution of mixtures, thus significantly improving how analytical thresholds may be established.

Original languageEnglish
Article number102146
JournalForensic Science International: Genetics
Volume43
DOIs
StatePublished - 1 Nov 2019

Fingerprint

Mitochondrial Genome
Haplotypes
Noise
Forensic Genetics
Databases
Human Genome
Computational Biology
Mitochondrial DNA
Base Pairing
Genome
Population

Keywords

  • Analytical threshold
  • Bioinformatics
  • Massively parallel sequencing
  • Mitochondrial haplotypes
  • PCR errors
  • Phylogenetic networks
  • Randomized minimum spanning trees
  • pNumts

Cite this

@article{20e8c4f4d9b9438db8b2b2024e78a73e,
title = "A novel phylogenetic approach for de novo discovery of putative nuclear mitochondrial (pNumt) haplotypes",
abstract = "Current approaches for parsing true variation (i.e. signal) from noise, broadly involve estimating a baseline value of the latter, below which all sequence data are ignored. In an effort to deliver a more objective criterion for setting such thresholds, a novel approach based on phylogenetic principles is presented here., Our method deconstructs a special category of noise from true mitochondrial genome data, namely nuclear insertions of mitochondrial DNA (Numts). This bioinformatic approach leverages the relationship of massively parallel sequence reads and is capable of discovering putative Numts (pNumts) in absence of a reference genome. The new method was tested on a whole mitochondrial genome dataset (n = 41 individuals from an admixed population sample from Rio de Janeiro) and led to the discovery of 451 pNumt variants. Comparison of these pNumts haplotypes against an existing Numt database revealed 147 exact matches to previously discovered Numts, while 122 haplotypes differed only by a single base pair and none matched exclusively to the mitochondrial genome. In general, these sequences were considerably more divergent from the mitochondrial genome than from those of the Numt database, supporting that the novel pNumts were probably hitherto uncatalogued variants. Unlike previous techniques, our method appears to be able to detect both polymorphic and fixed Numt sequences. It was also found that the region containing the D-Loop and associated Promoters (DLP) in the human mitochondrial genome, which harbors markers of forensic genetics importance, is the origin of several Numts. Though currently designed for the mitochondrial genome, our novel approach has the potential to be expanded to other scenarios that might require construing signal from noise, including the deconvolution of mixtures, thus significantly improving how analytical thresholds may be established.",
keywords = "Analytical threshold, Bioinformatics, Massively parallel sequencing, Mitochondrial haplotypes, PCR errors, Phylogenetic networks, Randomized minimum spanning trees, pNumts",
author = "Utpal Smart and Bruce Budowle and Angie Ambers and {Soares Moura-Neto}, Rodrigo and Rosane Silva and Woerner, {August Eric}",
year = "2019",
month = "11",
day = "1",
doi = "10.1016/j.fsigen.2019.102146",
language = "English",
volume = "43",
journal = "Forensic Science International: Genetics",
issn = "1872-4973",
publisher = "Elsevier Ireland Ltd",

}

A novel phylogenetic approach for de novo discovery of putative nuclear mitochondrial (pNumt) haplotypes. / Smart, Utpal; Budowle, Bruce; Ambers, Angie; Soares Moura-Neto, Rodrigo; Silva, Rosane; Woerner, August Eric.

In: Forensic Science International: Genetics, Vol. 43, 102146, 01.11.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A novel phylogenetic approach for de novo discovery of putative nuclear mitochondrial (pNumt) haplotypes

AU - Smart, Utpal

AU - Budowle, Bruce

AU - Ambers, Angie

AU - Soares Moura-Neto, Rodrigo

AU - Silva, Rosane

AU - Woerner, August Eric

PY - 2019/11/1

Y1 - 2019/11/1

N2 - Current approaches for parsing true variation (i.e. signal) from noise, broadly involve estimating a baseline value of the latter, below which all sequence data are ignored. In an effort to deliver a more objective criterion for setting such thresholds, a novel approach based on phylogenetic principles is presented here., Our method deconstructs a special category of noise from true mitochondrial genome data, namely nuclear insertions of mitochondrial DNA (Numts). This bioinformatic approach leverages the relationship of massively parallel sequence reads and is capable of discovering putative Numts (pNumts) in absence of a reference genome. The new method was tested on a whole mitochondrial genome dataset (n = 41 individuals from an admixed population sample from Rio de Janeiro) and led to the discovery of 451 pNumt variants. Comparison of these pNumts haplotypes against an existing Numt database revealed 147 exact matches to previously discovered Numts, while 122 haplotypes differed only by a single base pair and none matched exclusively to the mitochondrial genome. In general, these sequences were considerably more divergent from the mitochondrial genome than from those of the Numt database, supporting that the novel pNumts were probably hitherto uncatalogued variants. Unlike previous techniques, our method appears to be able to detect both polymorphic and fixed Numt sequences. It was also found that the region containing the D-Loop and associated Promoters (DLP) in the human mitochondrial genome, which harbors markers of forensic genetics importance, is the origin of several Numts. Though currently designed for the mitochondrial genome, our novel approach has the potential to be expanded to other scenarios that might require construing signal from noise, including the deconvolution of mixtures, thus significantly improving how analytical thresholds may be established.

AB - Current approaches for parsing true variation (i.e. signal) from noise, broadly involve estimating a baseline value of the latter, below which all sequence data are ignored. In an effort to deliver a more objective criterion for setting such thresholds, a novel approach based on phylogenetic principles is presented here., Our method deconstructs a special category of noise from true mitochondrial genome data, namely nuclear insertions of mitochondrial DNA (Numts). This bioinformatic approach leverages the relationship of massively parallel sequence reads and is capable of discovering putative Numts (pNumts) in absence of a reference genome. The new method was tested on a whole mitochondrial genome dataset (n = 41 individuals from an admixed population sample from Rio de Janeiro) and led to the discovery of 451 pNumt variants. Comparison of these pNumts haplotypes against an existing Numt database revealed 147 exact matches to previously discovered Numts, while 122 haplotypes differed only by a single base pair and none matched exclusively to the mitochondrial genome. In general, these sequences were considerably more divergent from the mitochondrial genome than from those of the Numt database, supporting that the novel pNumts were probably hitherto uncatalogued variants. Unlike previous techniques, our method appears to be able to detect both polymorphic and fixed Numt sequences. It was also found that the region containing the D-Loop and associated Promoters (DLP) in the human mitochondrial genome, which harbors markers of forensic genetics importance, is the origin of several Numts. Though currently designed for the mitochondrial genome, our novel approach has the potential to be expanded to other scenarios that might require construing signal from noise, including the deconvolution of mixtures, thus significantly improving how analytical thresholds may be established.

KW - Analytical threshold

KW - Bioinformatics

KW - Massively parallel sequencing

KW - Mitochondrial haplotypes

KW - PCR errors

KW - Phylogenetic networks

KW - Randomized minimum spanning trees

KW - pNumts

UR - http://www.scopus.com/inward/record.url?scp=85070975370&partnerID=8YFLogxK

U2 - 10.1016/j.fsigen.2019.102146

DO - 10.1016/j.fsigen.2019.102146

M3 - Article

C2 - 31446343

AN - SCOPUS:85070975370

VL - 43

JO - Forensic Science International: Genetics

JF - Forensic Science International: Genetics

SN - 1872-4973

M1 - 102146

ER -