Context aware group nearest shrunken centroids in large-scale genomic studies

Juemin Yang, Fang Han, Rafael A. Irizarry, Han Liu

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations


Recent genomic studies have identified genes related to specific phenotypes. In addition to marginal association analysis for individual genes, analyzing gene pathways (functionally related sets of genes) may yield additional valuable insights. We have devised an approach to phenotype classification from gene expression profiling. Our method named "group Nearest Shrunken Centroids (gNSC)" is an enhancement of the Nearest Shrunken Centroids (NSC) (Tibshirani, Hastie, Narasimhan and Chu 2002) which is a popular and scalable method to analyze big data. While fully utilizing the variable structure of gene pathways, gNSC shares comparable computational speed as NSC if the group size is small. Comparing with NSC, gNSC improves the power of classification by utilizing the gene pathway information. In practice, we investigate the performance of gNSC on one of the largest microarray datasets aggregated from the internet. We show the effectiveness of our method by comparing the misclassification rate of gNSC with that of NSC. Additionally, we present a novel application of NSC/gNSC on context analysis of association between pathways and certain medical words. Some newest biological findings are rediscovered.

Original languageEnglish (US)
Pages (from-to)1051-1059
Number of pages9
JournalJournal of Machine Learning Research
StatePublished - Jan 1 2014
Event17th International Conference on Artificial Intelligence and Statistics, AISTATS 2014 - Reykjavik, Iceland
Duration: Apr 22 2014Apr 25 2014

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence


Dive into the research topics of 'Context aware group nearest shrunken centroids in large-scale genomic studies'. Together they form a unique fingerprint.

Cite this