TY - GEN
T1 - High performance data mining using R on heterogeneous platforms
AU - Kumar, Prabhat
AU - Ozisikyilmaz, Berkin
AU - Liao, Wei-Keng
AU - Memik, Gokhan
AU - Choudhary, Alok Nidhi
PY - 2011
Y1 - 2011
N2 - The exponential increase in the generation and collection of data has led us in a new era of data analysis and information extraction. Conventional systems based on general-purpose processors are unable to keep pace with the heavy computational requirements of data mining techniques. High performance coprocessors like GPUs and FPGAs have the potential to handle large computational workloads. In this paper, we present a scalable framework aimed at providing a platform for developing and using high performance data mining applications on heterogeneous platforms. The framework incorporates a software infrastructure and a library of high performance kernels. Furthermore, it includes a variety of optimizations which increase the throughput of applications. The framework spans multiple technologies including R, GPUs, multi-core CPUs, MPI, and parallel-netCDF harnessing their capabilities for high-performance computations. This paper also introduces the concept of interleaving GPU kernels from multiple applications providing significant performance gain. Thus, in comparison to other tools available for data mining, our framework provides an easy-to-use and scalable environment both for application development and execution. The framework is available as a software package which can be easily integrated in the R programming environment.
AB - The exponential increase in the generation and collection of data has led us in a new era of data analysis and information extraction. Conventional systems based on general-purpose processors are unable to keep pace with the heavy computational requirements of data mining techniques. High performance coprocessors like GPUs and FPGAs have the potential to handle large computational workloads. In this paper, we present a scalable framework aimed at providing a platform for developing and using high performance data mining applications on heterogeneous platforms. The framework incorporates a software infrastructure and a library of high performance kernels. Furthermore, it includes a variety of optimizations which increase the throughput of applications. The framework spans multiple technologies including R, GPUs, multi-core CPUs, MPI, and parallel-netCDF harnessing their capabilities for high-performance computations. This paper also introduces the concept of interleaving GPU kernels from multiple applications providing significant performance gain. Thus, in comparison to other tools available for data mining, our framework provides an easy-to-use and scalable environment both for application development and execution. The framework is available as a software package which can be easily integrated in the R programming environment.
KW - Data Mining
KW - Fuzzy K-Means
KW - GPU
KW - K-Means
KW - MPI
KW - PCA
KW - Parallel-netCDF
KW - R
UR - http://www.scopus.com/inward/record.url?scp=83455220601&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=83455220601&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2011.329
DO - 10.1109/IPDPS.2011.329
M3 - Conference contribution
AN - SCOPUS:83455220601
SN - 9780769543857
T3 - IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
SP - 1720
EP - 1729
BT - 2011 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2011
T2 - 25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011
Y2 - 16 May 2011 through 20 May 2011
ER -