TY - JOUR
T1 - Fast and accurate inference of local ancestry in Latino populations
AU - Baran, Yael
AU - Pasaniuc, Bogdan
AU - Sankararaman, Sriram
AU - Torgerson, Dara G.
AU - Gignoux, Christopher
AU - Eng, Celeste
AU - Rodriguez-Cintron, William
AU - Chapela, Rocio
AU - Ford, Jean G.
AU - Avila, Pedro C.
AU - Rodriguez-Santana, Jose
AU - Burchard, Esteban Gonzàlez
AU - Halperin, Eran
N1 - Funding Information:
Funding: This study was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University. E.H. and Y.B. were partially supported by the Israeli Science Foundation, grant no. 04514831, and by the IBM open collaborative research. B.P. was supported by National Institutes of Health grant R01 HG006399.
PY - 2012/5
Y1 - 2012/5
N2 - Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in twoway admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.
AB - Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in twoway admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos.
UR - http://www.scopus.com/inward/record.url?scp=84861127863&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84861127863&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bts144
DO - 10.1093/bioinformatics/bts144
M3 - Article
C2 - 22495753
AN - SCOPUS:84861127863
SN - 1367-4803
VL - 28
SP - 1359
EP - 1367
JO - Bioinformatics
JF - Bioinformatics
IS - 10
M1 - bts144
ER -