Feature selection and molecular classification of cancer using genetic programming

Jianjun Yu, Jindan Yu, Arpit A. Almal, Saravana M. Dhanasekaran, Debashis Ghosh, William P. Worzel*, Arul M. Chinnaiyan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

60 Scopus citations


Despite important advances in microarray-based molecular classification of tumors, its application in clinical settings remains formidable. This is in part due to the limitation of current analysis programs in discovering robust biomarkers and developing classifiers with a practical set of genes. Genetic programming (GP) is a type of machine learning technique that uses evolutionary algorithm to simulate natural selection as well as population dynamics, hence leading to simple and comprehensible classifiers. Here we applied GP to cancer expression profiling data to select feature genes and build molecular classifiers by mathematical integration of these genes. Analysis of thousands of GP classifiers generated for a prostate cancer data set revealed repetitive use of a set of highly discriminative feature genes, many of which are known to be disease associated. GP classifiers often comprise five or less genes and successfully predict cancer types and subtypes. More importantly, GP classifiers generated in one study are able to predict samples from an independent study, which may have used different microarray platforms. In addition, GP yielded classification accuracy better than or similar to conventional classification methods. Furthermore, the mathematical expression of GP classifiers provides insights into relationships between classifier genes. Taken together, our results demonstrate that GP may be valuable for generating effective classifiers containing a practical set of genes for diagnostic/prognostic cancer classification.

Original languageEnglish (US)
Pages (from-to)292-303
Number of pages12
Issue number4
StatePublished - Apr 2007


  • Biomarkers
  • Evolutionary algorithm
  • Microarray profiling
  • Molecular diagnostics
  • Prostate cancer

ASJC Scopus subject areas

  • Cancer Research


Dive into the research topics of 'Feature selection and molecular classification of cancer using genetic programming'. Together they form a unique fingerprint.

Cite this