US forensic Y-chromosome short tandem repeats database

Jianye Ge, Bruce Budowle, John V. Planz, Arthur J. Eisenberg, Jack Ballantyne, Ranajit Chakraborty

Research output: Contribution to journalArticlepeer-review

26 Scopus citations


A forensic Y-STR database generated in the US was compiled with profiles containing a portion or complete typing of 16 STR markers DYS19, DYS385, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS456, DYS458, DYS635, DYS448, and Y GATA H4. There were 17,447 samples in the version of database in which 77% and 20% were collected in North America and Asia, respectively. The database was separated into six general populations, African American, Asian, Caucasian, Hispanic, Indian, and Native American. Each population was further classified into subgroups according to geographic regions. Some subgroups were tested, found to be homogenous and merged together. Allele and haplotype frequencies, as well as sample sizes were summarized. Of the full haplotypes (i.e., 16 STRs without missing data), 93.7% in total population were distinct, 92.9% were population specific, and 89.3% were only observed once. The majority of shared haplotypes were found among North American populations as a result of admixture lasting the past few hundred years. The power of discrimination (PD), coancestry coefficient (Fst), and coefficient of gene differentiation (Gst) at locus and haplotype levels were also calculated. The most polymorphic marker was DYS385; this marker contains a tandem duplication and actually is composed of two loci. Both Gst and Fst estimates were very small with haplotypes composed of a high number of STRs haplotypes (e.g., 10-16 markers), although Gst is slightly more conservative for these extended haplotypes. With Native American removed from the total population data set, the Gst and Fst estimates reduce further. PD was 0.9998 for the total population dataset for all 16 Y-STR markers. Three measures of Y-STR profile frequency were calculated: (1) unconditional haplotype frequency, (2) population substructure adjusted frequency, and (3) binomial upper bound of the haplotype frequency. The binomial upper bound is the most conservative estimate for most forensic applications. Estimates of the weight of a Y-STR haplotype can be estimated using population specific or total population databases.

Original languageEnglish
Pages (from-to)289-295
Number of pages7
JournalLegal Medicine
Issue number6
StatePublished - Nov 2010


  • Evidence interpretation
  • F
  • Forensic genetics
  • G
  • Haplotype frequency
  • Population substructure
  • Y-STR database
  • Y-chromosome


Dive into the research topics of 'US forensic Y-chromosome short tandem repeats database'. Together they form a unique fingerprint.

Cite this