Complementary feature selection from alternative splicing events and gene expression for phenotype prediction

Charles J. Labuzzetta*, Margaret L. Antonio, Patricia M. Watson, Robert C. Wilson, Lauren A. Laboissonniere, Jeffrey M. Trimarchi, Baris Genc, P. Hande Ozdinler, Dennis K. Watson, Paul E. Anderson

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations


Motivation: A central task of bioinformatics is to develop sensitive and specific means of providing medical prognoses from biomarker patterns. Common methods to predict phenotypes in RNA-Seq datasets utilize machine learning algorithms trained via gene expression. Isoforms, however, generated from alternative splicing, may provide a novel and complementary set of transcripts for phenotype prediction. In contrast to gene expression, the number of isoforms increases significantly due to numerous alternative splicing patterns, resulting in a prioritization problem for many machine learning algorithms. This study identifies the empirically optimal methods of transcript quantification, feature engineering and filtering steps using phenotype prediction accuracy as a metric. At the same time, the complementary nature of gene and isoform data is analyzed and the feasibility of identifying isoforms as biomarker candidates is examined. Results: Isoform features are complementary to gene features, providing non-redundant information and enhanced predictive power when prioritized and filtered. A univariate filtering algorithm, which selects up to the N highest ranking features for phenotype prediction is described and evaluated in this study. An empirical comparison of pipelines for isoform quantification is reported by performing cross-validation prediction tests with datasets from human non-small cell lung cancer (NSCLC) patients, human patients with chronic obstructive pulmonary disease (COPD) and amyotrophic lateral sclerosis (ALS) transgenic mice, each including samples of diseased and non-diseased phenotypes. Availability and Implementation:

Original languageEnglish (US)
Pages (from-to)i421-i429
Issue number17
StatePublished - Sep 1 2016

ASJC Scopus subject areas

  • Computational Mathematics
  • Molecular Biology
  • Biochemistry
  • Statistics and Probability
  • Computer Science Applications
  • Computational Theory and Mathematics


Dive into the research topics of 'Complementary feature selection from alternative splicing events and gene expression for phenotype prediction'. Together they form a unique fingerprint.

Cite this