TY - JOUR
T1 - Accurate and Fast Multiple-Testing Correction in eQTL Studies
AU - Sul, Jae Hoon
AU - Raj, Towfique
AU - de Jong, Simone
AU - de Bakker, Paul I.W.
AU - Raychaudhuri, Soumya
AU - Ophoff, Roel A.
AU - Stranger, Barbara Elaine
AU - Eskin, Eleazar
AU - Han, Buhm
N1 - Funding Information:
J.H.S. and E.E. are supported by National Science Foundation grants 0513612, 0731455, 0729049, 0916676, 1065276, 1302448, and 1320589 and by NIH grants K25-HL080079, U01-DA024417, P01-HL30568, P01-HL28481, R01- GM083198, R01-ES021801, R01-MH101782, U54EB020403, and R01- ES022282. B.H. is supported by grants 2015-7011 and 2015-0222 from the Asan Institute for Life Sciences at the Asan Medical Center in Seoul, Korea. S.J. and R.O. are supported by NIH grant R01-MH090553. S.R. is funded in part by NIH grants 1R01AR063759-01A1, 5U01GM092691-04, and UH2AR067677-01. B.S. is funded by NIH grant 1U01HG007598-01. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the NIH. Additional funds were provided by the National Cancer Institute (NCI), National Human Genome Research Institute, NHLBI, National Institute on Drug Abuse, NIMH, and National Institute of Neurological Disorders and Stroke. Additional funds were provided by NCI and SAIC-Frederick (SAIC-F) subcontracts to the National Disease Research Interchange (10XS170), the Roswell Park Cancer Institute (10XS171), Science Care (X10S172), and a contract (HHSN268201000029C) to the Broad Institute. Biorepository operations were funded through an SAIC-F subcontract (10ST1035) to the Van Andel Institute. Additional data repository and project management were provided by SAIC-F (HHSN261200800001E). The Brain Bank was supported by a supplement to University of Miami grant DA006227. Grant support for statistical-methods development was provided by NIH grants MH090941, MH090951, MH090937, MH090936, and MH090948. The GTEx data were obtained from dbGaP accession number phs000424.v3.p1 on November 10, 2014.
Publisher Copyright:
© 2015 The American Society of Human Genetics
PY - 2015/5/1
Y1 - 2015/5/1
N2 - In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum p value among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset.
AB - In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum p value among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset.
UR - http://www.scopus.com/inward/record.url?scp=84930024854&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84930024854&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2015.04.012
DO - 10.1016/j.ajhg.2015.04.012
M3 - Article
C2 - 26027500
AN - SCOPUS:84930024854
SN - 0002-9297
VL - 96
SP - 857
EP - 868
JO - American journal of human genetics
JF - American journal of human genetics
IS - 6
ER -