TY - JOUR
T1 - Feature Selection for Optimized High-Dimensional Biomedical Data Using an Improved Shuffled Frog Leaping Algorithm
AU - Hu, Bin
AU - Dai, Yongqiang
AU - Su, Yun
AU - Moore, Philip
AU - Zhang, Xiaowei
AU - Mao, Chengsheng
AU - Chen, Jing
AU - Xu, Lixin
N1 - Funding Information:
Bin Hu is the corresponding author. Yongqiang Dai is the co-first author. This work is supported by the National Basic Research Program of China (973 Program) (no. 2014CB744600), the National Natural Science Foundation of China (nos. 61402211, 61063028 and 61210010), the Natural Science Foundation of Gansu Province (No. 1506RJZA007), and the Natural Science Foundation of the Higher Education Institutions of Gansu Province, China (2015A-008).
Publisher Copyright:
© 2004-2012 IEEE.
PY - 2018/11/1
Y1 - 2018/11/1
N2 - High dimensional biomedical datasets contain thousands of features which can be used in molecular diagnosis of disease, however, such datasets contain many irrelevant or weak correlation features which influence the predictive accuracy of diagnosis. Without a feature selection algorithm, it is difficult for the existing classification techniques to accurately identify patterns in the features. The purpose of feature selection is to not only identify a feature subset from an original set of features [without reducing the predictive accuracy of classification algorithm] but also reduce the computation overhead in data mining. In this paper, we present our improved shuffled frog leaping algorithm which introduces a chaos memory weight factor, an absolute balance group strategy, and an adaptive transfer factor. Our proposed approach explores the space of possible subsets to obtain the set of features that maximizes the predictive accuracy and minimizes irrelevant features in high-dimensional biomedical data. To evaluate the effectiveness of our proposed method, we have employed the K-nearest neighbor method with a comparative analysis in which we compare our proposed approach with genetic algorithms, particle swarm optimization, and the shuffled frog leaping algorithm. Experimental results show that our improved algorithm achieves improvements in the identification of relevant subsets and in classification accuracy.
AB - High dimensional biomedical datasets contain thousands of features which can be used in molecular diagnosis of disease, however, such datasets contain many irrelevant or weak correlation features which influence the predictive accuracy of diagnosis. Without a feature selection algorithm, it is difficult for the existing classification techniques to accurately identify patterns in the features. The purpose of feature selection is to not only identify a feature subset from an original set of features [without reducing the predictive accuracy of classification algorithm] but also reduce the computation overhead in data mining. In this paper, we present our improved shuffled frog leaping algorithm which introduces a chaos memory weight factor, an absolute balance group strategy, and an adaptive transfer factor. Our proposed approach explores the space of possible subsets to obtain the set of features that maximizes the predictive accuracy and minimizes irrelevant features in high-dimensional biomedical data. To evaluate the effectiveness of our proposed method, we have employed the K-nearest neighbor method with a comparative analysis in which we compare our proposed approach with genetic algorithms, particle swarm optimization, and the shuffled frog leaping algorithm. Experimental results show that our improved algorithm achieves improvements in the identification of relevant subsets and in classification accuracy.
KW - Shuffled frog leaping algorithm
KW - biomedical data
KW - classification accuracy
KW - feature selection
KW - k-nearest neighbor
UR - http://www.scopus.com/inward/record.url?scp=85052646831&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052646831&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2016.2602263
DO - 10.1109/TCBB.2016.2602263
M3 - Article
C2 - 28113635
AN - SCOPUS:85052646831
SN - 1545-5963
VL - 15
SP - 1765
EP - 1773
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 6
M1 - 7551172
ER -