Modeling SNP array ascertainment with Approximate Bayesian Computation for demographic inference

Consuelo D. Quinto-Cortés, August Eric Woerner, Joseph C. Watkins, Michael F. Hammer

Research output: Contribution to journalArticleResearchpeer-review

1 Citation (Scopus)

Abstract

Single nucleotide polymorphisms (SNPs) in commercial arrays have often been discovered in a small number of samples from selected populations. This ascertainment skews patterns of nucleotide diversity and affects population genetic inferences. We propose a demographic inference pipeline that explicitly models the SNP discovery protocol in an Approximate Bayesian Computation (ABC) framework. We simulated genomic regions according to a demographic model incorporating parameters for the divergence of three well-characterized HapMap populations and recreated the SNP distribution of a commercial array by varying the number of haploid samples and the allele frequency cut-off in the given regions. We then calculated summary statistics obtained from both the ascertained and genomic data and inferred ascertainment and demographic parameters. We implemented our pipeline to study the admixture process that gave rise to the present-day Mexican population. Our estimate of the time of admixture is closer to the historical dates than those in previous works which did not consider ascertainment bias. Although the use of whole genome sequences for demographic inference is becoming the norm, there are still underrepresented areas of the world from where only SNP array data are available. Our inference framework is applicable to those cases and will help with the demographic inference.

Original languageEnglish
Article number10209
JournalScientific Reports
Volume8
Issue number1
DOIs
StatePublished - 1 Dec 2018

Fingerprint

Nucleotides
Polymorphism
Pipelines
Genes
Statistics

Cite this

Quinto-Cortés, Consuelo D. ; Woerner, August Eric ; Watkins, Joseph C. ; Hammer, Michael F. / Modeling SNP array ascertainment with Approximate Bayesian Computation for demographic inference. In: Scientific Reports. 2018 ; Vol. 8, No. 1.
@article{929fc9adf9514a5bbc31d3a19faaf15d,
title = "Modeling SNP array ascertainment with Approximate Bayesian Computation for demographic inference",
abstract = "Single nucleotide polymorphisms (SNPs) in commercial arrays have often been discovered in a small number of samples from selected populations. This ascertainment skews patterns of nucleotide diversity and affects population genetic inferences. We propose a demographic inference pipeline that explicitly models the SNP discovery protocol in an Approximate Bayesian Computation (ABC) framework. We simulated genomic regions according to a demographic model incorporating parameters for the divergence of three well-characterized HapMap populations and recreated the SNP distribution of a commercial array by varying the number of haploid samples and the allele frequency cut-off in the given regions. We then calculated summary statistics obtained from both the ascertained and genomic data and inferred ascertainment and demographic parameters. We implemented our pipeline to study the admixture process that gave rise to the present-day Mexican population. Our estimate of the time of admixture is closer to the historical dates than those in previous works which did not consider ascertainment bias. Although the use of whole genome sequences for demographic inference is becoming the norm, there are still underrepresented areas of the world from where only SNP array data are available. Our inference framework is applicable to those cases and will help with the demographic inference.",
author = "Quinto-Cort{\'e}s, {Consuelo D.} and Woerner, {August Eric} and Watkins, {Joseph C.} and Hammer, {Michael F.}",
year = "2018",
month = "12",
day = "1",
doi = "10.1038/s41598-018-28539-y",
language = "English",
volume = "8",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",
number = "1",

}

Modeling SNP array ascertainment with Approximate Bayesian Computation for demographic inference. / Quinto-Cortés, Consuelo D.; Woerner, August Eric; Watkins, Joseph C.; Hammer, Michael F.

In: Scientific Reports, Vol. 8, No. 1, 10209, 01.12.2018.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Modeling SNP array ascertainment with Approximate Bayesian Computation for demographic inference

AU - Quinto-Cortés, Consuelo D.

AU - Woerner, August Eric

AU - Watkins, Joseph C.

AU - Hammer, Michael F.

PY - 2018/12/1

Y1 - 2018/12/1

N2 - Single nucleotide polymorphisms (SNPs) in commercial arrays have often been discovered in a small number of samples from selected populations. This ascertainment skews patterns of nucleotide diversity and affects population genetic inferences. We propose a demographic inference pipeline that explicitly models the SNP discovery protocol in an Approximate Bayesian Computation (ABC) framework. We simulated genomic regions according to a demographic model incorporating parameters for the divergence of three well-characterized HapMap populations and recreated the SNP distribution of a commercial array by varying the number of haploid samples and the allele frequency cut-off in the given regions. We then calculated summary statistics obtained from both the ascertained and genomic data and inferred ascertainment and demographic parameters. We implemented our pipeline to study the admixture process that gave rise to the present-day Mexican population. Our estimate of the time of admixture is closer to the historical dates than those in previous works which did not consider ascertainment bias. Although the use of whole genome sequences for demographic inference is becoming the norm, there are still underrepresented areas of the world from where only SNP array data are available. Our inference framework is applicable to those cases and will help with the demographic inference.

AB - Single nucleotide polymorphisms (SNPs) in commercial arrays have often been discovered in a small number of samples from selected populations. This ascertainment skews patterns of nucleotide diversity and affects population genetic inferences. We propose a demographic inference pipeline that explicitly models the SNP discovery protocol in an Approximate Bayesian Computation (ABC) framework. We simulated genomic regions according to a demographic model incorporating parameters for the divergence of three well-characterized HapMap populations and recreated the SNP distribution of a commercial array by varying the number of haploid samples and the allele frequency cut-off in the given regions. We then calculated summary statistics obtained from both the ascertained and genomic data and inferred ascertainment and demographic parameters. We implemented our pipeline to study the admixture process that gave rise to the present-day Mexican population. Our estimate of the time of admixture is closer to the historical dates than those in previous works which did not consider ascertainment bias. Although the use of whole genome sequences for demographic inference is becoming the norm, there are still underrepresented areas of the world from where only SNP array data are available. Our inference framework is applicable to those cases and will help with the demographic inference.

UR - http://www.scopus.com/inward/record.url?scp=85049656945&partnerID=8YFLogxK

U2 - 10.1038/s41598-018-28539-y

DO - 10.1038/s41598-018-28539-y

M3 - Article

VL - 8

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

IS - 1

M1 - 10209

ER -