TY - JOUR

T1 - An E-M algorithm and testing strategy for multiple-locus haplotypes

AU - Long, J. C.

AU - Williams, R. C.

AU - Urbanek, M.

PY - 1995

Y1 - 1995

N2 - This paper gives an expectation maximization (EM) algorithm to obtain allele frequencies, haplotype frequencies, and gametic disequilibrium coefficients for multiple-locus systems. It permits high polymorphism and null alleles at all loci. This approach effectively deals with the primary estimation problems associated with such systems; that is, there is not a one-to-one correspondence between phenotypic and genotypic categories, and sample sizes tend to be much smaller than the number of phenotypic categories. The EM method provides maximum-likelihood estimates and therefore allows hypothesis tests using likelihood ratio statistics that have χ2 distributions with large sample sizes. We also suggest a data resampling approach to estimate test statistic sampling distributions. The resampling approach is more computes intensive, but it is applicable to all sample sizes. A strategy to test hypotheses about aggregate groups of gametic disequilibrium coefficients is recommended. This strategy minimizes the number of necessary hypothesis tests while at the same time describing the structure of disequilibrium. These methods are applied to three unlinked dinucleotide repeat loci in Navajo Indians and to three linked HLA loci in Gila River (Pima) Indians. The likelihood functions of both data sets are shown to be maximized by the EM estimates, and the testing strategy provides a useful description of the structure of gametic disequilibrium. Following these applications, a number of simulation experiments are performed to test how well the likelihood-ratio statistic distributions are approximated by χ2 distributions. In most circumstances the χ2 grossly underestimated the probability of type I errors. However, at times they also overestimated the type 1 error probability. Accordingly, we recommend hypothesis tests that use the resampling method.

AB - This paper gives an expectation maximization (EM) algorithm to obtain allele frequencies, haplotype frequencies, and gametic disequilibrium coefficients for multiple-locus systems. It permits high polymorphism and null alleles at all loci. This approach effectively deals with the primary estimation problems associated with such systems; that is, there is not a one-to-one correspondence between phenotypic and genotypic categories, and sample sizes tend to be much smaller than the number of phenotypic categories. The EM method provides maximum-likelihood estimates and therefore allows hypothesis tests using likelihood ratio statistics that have χ2 distributions with large sample sizes. We also suggest a data resampling approach to estimate test statistic sampling distributions. The resampling approach is more computes intensive, but it is applicable to all sample sizes. A strategy to test hypotheses about aggregate groups of gametic disequilibrium coefficients is recommended. This strategy minimizes the number of necessary hypothesis tests while at the same time describing the structure of disequilibrium. These methods are applied to three unlinked dinucleotide repeat loci in Navajo Indians and to three linked HLA loci in Gila River (Pima) Indians. The likelihood functions of both data sets are shown to be maximized by the EM estimates, and the testing strategy provides a useful description of the structure of gametic disequilibrium. Following these applications, a number of simulation experiments are performed to test how well the likelihood-ratio statistic distributions are approximated by χ2 distributions. In most circumstances the χ2 grossly underestimated the probability of type I errors. However, at times they also overestimated the type 1 error probability. Accordingly, we recommend hypothesis tests that use the resampling method.

UR - http://www.scopus.com/inward/record.url?scp=0028913523&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0028913523&partnerID=8YFLogxK

M3 - Article

C2 - 7887436

AN - SCOPUS:0028913523

SN - 0002-9297

VL - 56

SP - 799

EP - 810

JO - American journal of human genetics

JF - American journal of human genetics

IS - 3

ER -