Microarray gene expression data has been used in genome-wide association studies to allow researchers to study gene regulation as well as other complex phenotypes including disease risks and drug response. To reach scientifically sound conclusions from these studies, however, it is necessary to get reliable summarization of gene expression intensities. Among various factors that could affect expression profiling using a microarray platform, single nucleotide polymorphisms (SNPs) in target mRNA may lead to reduced signal intensity measurements and result in spurious results. The recently released 1000 Genomes Project dataset provides an opportunity to evaluate the distribution of both known and novel SNPs in the International HapMap Project lymphoblastoid cell lines (LCLs). We mapped the 1000 Genomes Project genotypic data to the Affymetrix GeneChip Human Exon 1.0ST array (exon array), which had been used in our previous studies and for which gene expression data had been made publicly available. We also evaluated the potential impact of these SNPs on the differentially spliced probesets we had identified previously. Though the 1000 Genomes Project data allowed a comprehensive survey of the SNPs in this particular array, the same approach can certainly be applied to other microarray platforms. Furthermore, we present a detailed catalogue of SNP-containing probesets (exon-level) and transcript clusters (gene-level), which can be considered in evaluating findings using the exon array as well as benefit the design of follow-up experiments and data re-analysis.
ASJC Scopus subject areas