Forensic human identification with targeted microbiome markers using nearest neighbor classification

August Eric Woerner, Nicole M.M. Novroski, Frank R. Wendt, Angela Dawn Ambers, Rachel Wiley, Sarah E. Schmedes, Bruce Budowle

Research output: Contribution to journalArticleResearchpeer-review

1 Citation (Scopus)

Abstract

From the perspective of forensics genetics, the human microbiome is a rich, relatively untapped resource for human identity testing. Since it varies within and among people, and perhaps temporally, the potential forensic applications of the use of the microbiome can exceed that of human identification. However, the same inherent variability in microbial distributions may pose a substantial barrier to forming predictions on an individual as the source of the microbial sample unless stable signatures of the microbiome are identified and targeted. One of the more commonly adopted strategies for microbial human identification relies on quantifying which taxa are present and their respective abundance levels. It remains an open question if such microbial signatures are more individualizing than estimates of the degree of genetic relatedness between microbial samples. This study attempts to address this question by contrasting two prediction strategies. The first approach uses phylogenetic distance to predict the host individual; thus it operates under the premise that microbes within individuals are more closely related than microbes between/among individuals. The second approach uses population genetic measures of diversity at clade-specific markers, serving as a fine-grained assessment of microbial composition and quantification. Both assessments were performed using targeted sequencing of 286 markers from 22 microbial taxa sampled in 51 individuals across three body sites measured in triplicate. Nearest neighbor and reverse nearest neighbor classifiers were constructed based on the pooled data and yielded 71% and 78% accuracy, respectively, when diversity was considered, and performed significantly worse when a phylogenetic distance was used (54% and 63% accuracy, respectively). However, empirical estimates of classification accuracy were 100% when conditioned on a maximum nearest neighbor distance when diversity was used, while identification based on a phylogenetic distance failed to reach saturation. These findings suggest that microbial strain composition is more individualizing than that of a phylogeny, perhaps indicating that microbial composition may be more individualizing than recent common ancestry. One inference that may be drawn from these findings is that host-environment interactions may maintain the targeted microbial profile and that this maintenance may not necessarily be repopulated by intra-individual microbial strains.

Original languageEnglish
Pages (from-to)130-139
Number of pages10
JournalForensic Science International: Genetics
Volume38
DOIs
StatePublished - 1 Jan 2019

Fingerprint

Forensic Anthropology
Microbiota
Forensic Genetics
Population Genetics
Phylogeny
Maintenance

Keywords

  • Human identification
  • Massively parallel sequencing
  • Microbiome
  • Next generation sequencing
  • hidSkinPlex

Cite this

Woerner, August Eric ; Novroski, Nicole M.M. ; Wendt, Frank R. ; Ambers, Angela Dawn ; Wiley, Rachel ; Schmedes, Sarah E. ; Budowle, Bruce. / Forensic human identification with targeted microbiome markers using nearest neighbor classification. In: Forensic Science International: Genetics. 2019 ; Vol. 38. pp. 130-139.
@article{44e92871c0124401846197ab4c149a9d,
title = "Forensic human identification with targeted microbiome markers using nearest neighbor classification",
abstract = "From the perspective of forensics genetics, the human microbiome is a rich, relatively untapped resource for human identity testing. Since it varies within and among people, and perhaps temporally, the potential forensic applications of the use of the microbiome can exceed that of human identification. However, the same inherent variability in microbial distributions may pose a substantial barrier to forming predictions on an individual as the source of the microbial sample unless stable signatures of the microbiome are identified and targeted. One of the more commonly adopted strategies for microbial human identification relies on quantifying which taxa are present and their respective abundance levels. It remains an open question if such microbial signatures are more individualizing than estimates of the degree of genetic relatedness between microbial samples. This study attempts to address this question by contrasting two prediction strategies. The first approach uses phylogenetic distance to predict the host individual; thus it operates under the premise that microbes within individuals are more closely related than microbes between/among individuals. The second approach uses population genetic measures of diversity at clade-specific markers, serving as a fine-grained assessment of microbial composition and quantification. Both assessments were performed using targeted sequencing of 286 markers from 22 microbial taxa sampled in 51 individuals across three body sites measured in triplicate. Nearest neighbor and reverse nearest neighbor classifiers were constructed based on the pooled data and yielded 71{\%} and 78{\%} accuracy, respectively, when diversity was considered, and performed significantly worse when a phylogenetic distance was used (54{\%} and 63{\%} accuracy, respectively). However, empirical estimates of classification accuracy were 100{\%} when conditioned on a maximum nearest neighbor distance when diversity was used, while identification based on a phylogenetic distance failed to reach saturation. These findings suggest that microbial strain composition is more individualizing than that of a phylogeny, perhaps indicating that microbial composition may be more individualizing than recent common ancestry. One inference that may be drawn from these findings is that host-environment interactions may maintain the targeted microbial profile and that this maintenance may not necessarily be repopulated by intra-individual microbial strains.",
keywords = "Human identification, Massively parallel sequencing, Microbiome, Next generation sequencing, hidSkinPlex",
author = "Woerner, {August Eric} and Novroski, {Nicole M.M.} and Wendt, {Frank R.} and Ambers, {Angela Dawn} and Rachel Wiley and Schmedes, {Sarah E.} and Bruce Budowle",
year = "2019",
month = "1",
day = "1",
doi = "10.1016/j.fsigen.2018.10.003",
language = "English",
volume = "38",
pages = "130--139",
journal = "Forensic Science International: Genetics",
issn = "1872-4973",
publisher = "Elsevier Ireland Ltd",

}

Forensic human identification with targeted microbiome markers using nearest neighbor classification. / Woerner, August Eric; Novroski, Nicole M.M.; Wendt, Frank R.; Ambers, Angela Dawn; Wiley, Rachel; Schmedes, Sarah E.; Budowle, Bruce.

In: Forensic Science International: Genetics, Vol. 38, 01.01.2019, p. 130-139.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Forensic human identification with targeted microbiome markers using nearest neighbor classification

AU - Woerner, August Eric

AU - Novroski, Nicole M.M.

AU - Wendt, Frank R.

AU - Ambers, Angela Dawn

AU - Wiley, Rachel

AU - Schmedes, Sarah E.

AU - Budowle, Bruce

PY - 2019/1/1

Y1 - 2019/1/1

N2 - From the perspective of forensics genetics, the human microbiome is a rich, relatively untapped resource for human identity testing. Since it varies within and among people, and perhaps temporally, the potential forensic applications of the use of the microbiome can exceed that of human identification. However, the same inherent variability in microbial distributions may pose a substantial barrier to forming predictions on an individual as the source of the microbial sample unless stable signatures of the microbiome are identified and targeted. One of the more commonly adopted strategies for microbial human identification relies on quantifying which taxa are present and their respective abundance levels. It remains an open question if such microbial signatures are more individualizing than estimates of the degree of genetic relatedness between microbial samples. This study attempts to address this question by contrasting two prediction strategies. The first approach uses phylogenetic distance to predict the host individual; thus it operates under the premise that microbes within individuals are more closely related than microbes between/among individuals. The second approach uses population genetic measures of diversity at clade-specific markers, serving as a fine-grained assessment of microbial composition and quantification. Both assessments were performed using targeted sequencing of 286 markers from 22 microbial taxa sampled in 51 individuals across three body sites measured in triplicate. Nearest neighbor and reverse nearest neighbor classifiers were constructed based on the pooled data and yielded 71% and 78% accuracy, respectively, when diversity was considered, and performed significantly worse when a phylogenetic distance was used (54% and 63% accuracy, respectively). However, empirical estimates of classification accuracy were 100% when conditioned on a maximum nearest neighbor distance when diversity was used, while identification based on a phylogenetic distance failed to reach saturation. These findings suggest that microbial strain composition is more individualizing than that of a phylogeny, perhaps indicating that microbial composition may be more individualizing than recent common ancestry. One inference that may be drawn from these findings is that host-environment interactions may maintain the targeted microbial profile and that this maintenance may not necessarily be repopulated by intra-individual microbial strains.

AB - From the perspective of forensics genetics, the human microbiome is a rich, relatively untapped resource for human identity testing. Since it varies within and among people, and perhaps temporally, the potential forensic applications of the use of the microbiome can exceed that of human identification. However, the same inherent variability in microbial distributions may pose a substantial barrier to forming predictions on an individual as the source of the microbial sample unless stable signatures of the microbiome are identified and targeted. One of the more commonly adopted strategies for microbial human identification relies on quantifying which taxa are present and their respective abundance levels. It remains an open question if such microbial signatures are more individualizing than estimates of the degree of genetic relatedness between microbial samples. This study attempts to address this question by contrasting two prediction strategies. The first approach uses phylogenetic distance to predict the host individual; thus it operates under the premise that microbes within individuals are more closely related than microbes between/among individuals. The second approach uses population genetic measures of diversity at clade-specific markers, serving as a fine-grained assessment of microbial composition and quantification. Both assessments were performed using targeted sequencing of 286 markers from 22 microbial taxa sampled in 51 individuals across three body sites measured in triplicate. Nearest neighbor and reverse nearest neighbor classifiers were constructed based on the pooled data and yielded 71% and 78% accuracy, respectively, when diversity was considered, and performed significantly worse when a phylogenetic distance was used (54% and 63% accuracy, respectively). However, empirical estimates of classification accuracy were 100% when conditioned on a maximum nearest neighbor distance when diversity was used, while identification based on a phylogenetic distance failed to reach saturation. These findings suggest that microbial strain composition is more individualizing than that of a phylogeny, perhaps indicating that microbial composition may be more individualizing than recent common ancestry. One inference that may be drawn from these findings is that host-environment interactions may maintain the targeted microbial profile and that this maintenance may not necessarily be repopulated by intra-individual microbial strains.

KW - Human identification

KW - Massively parallel sequencing

KW - Microbiome

KW - Next generation sequencing

KW - hidSkinPlex

UR - http://www.scopus.com/inward/record.url?scp=85055909768&partnerID=8YFLogxK

U2 - 10.1016/j.fsigen.2018.10.003

DO - 10.1016/j.fsigen.2018.10.003

M3 - Article

VL - 38

SP - 130

EP - 139

JO - Forensic Science International: Genetics

JF - Forensic Science International: Genetics

SN - 1872-4973

ER -