Abstract
RNA sequencing data have been abundantly generated in biomedical research for biomarker discovery and other studies. Such data at the exon level are usually heavily tailed and correlated. Conventional statistical tests based on the mean or median difference for differential expression likely suffer from low power when the between-group difference occurs mostly in the upper or lower tail of the distribution of gene expression. We propose a tail-based test to make comparisons between groups in terms of a specific distribution area rather than a single location. The proposed test, which is derived from quantile regression, adjusts for covariates and accounts for within-sample dependence among the exons through a specified correlation structure. Through Monte Carlo simulation studies, we show that the proposed test is generally more powerful and robust in detecting differential expression than commonly used tests based on the mean or a single quantile. An application to TCGA lung adenocarcinoma data demonstrates the promise of the proposed method in terms of biomarker discovery.
Original language | English (US) |
---|---|
Pages (from-to) | 261-276 |
Number of pages | 16 |
Journal | Statistical Methods in Medical Research |
Volume | 30 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2021 |
Keywords
- Correlated data
- RNA sequencing
- differential expression analysis
- quantile regression
- robust tail-based test
ASJC Scopus subject areas
- Health Information Management
- Epidemiology
- Statistics and Probability