Abstract
We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome- wide association studies' examinemillions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status fromthese data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult formany predictionmethods to use all the features directly. The ReliefF algorithmranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a predictionmethod. We compared the performance of eight predictionmethods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-basedmethods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-basedmethods use the Bayesian Dirichlet Equivalent Uniformscore to evaluate candidatemodels, and use BN inference algorithms to performprediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they performbetter in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-basedmethods.
Original language | English (US) |
---|---|
Pages (from-to) | 912-921 |
Number of pages | 10 |
Journal | Briefings in Bioinformatics |
Volume | 16 |
Issue number | 6 |
DOIs | |
State | Published - Feb 6 2015 |
Funding
Keywords
- Bayesian network
- Big data
- GWAS
- High-dimensional data
- Prediction
- SNP
ASJC Scopus subject areas
- Information Systems
- Molecular Biology