High performance data mining using R on heterogeneous platforms

Prabhat Kumar*, Berkin Ozisikyilmaz, Wei-Keng Liao, Gokhan Memik, Alok Nidhi Choudhary

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

The exponential increase in the generation and collection of data has led us in a new era of data analysis and information extraction. Conventional systems based on general-purpose processors are unable to keep pace with the heavy computational requirements of data mining techniques. High performance coprocessors like GPUs and FPGAs have the potential to handle large computational workloads. In this paper, we present a scalable framework aimed at providing a platform for developing and using high performance data mining applications on heterogeneous platforms. The framework incorporates a software infrastructure and a library of high performance kernels. Furthermore, it includes a variety of optimizations which increase the throughput of applications. The framework spans multiple technologies including R, GPUs, multi-core CPUs, MPI, and parallel-netCDF harnessing their capabilities for high-performance computations. This paper also introduces the concept of interleaving GPU kernels from multiple applications providing significant performance gain. Thus, in comparison to other tools available for data mining, our framework provides an easy-to-use and scalable environment both for application development and execution. The framework is available as a software package which can be easily integrated in the R programming environment.

Original languageEnglish (US)
Title of host publication2011 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2011
Pages1720-1729
Number of pages10
DOIs
StatePublished - Dec 20 2011
Event25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011 - Anchorage, AK, United States
Duration: May 16 2011May 20 2011

Publication series

NameIEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum

Other

Other25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011
CountryUnited States
CityAnchorage, AK
Period5/16/115/20/11

Keywords

  • Data Mining
  • Fuzzy K-Means
  • GPU
  • K-Means
  • MPI
  • PCA
  • Parallel-netCDF
  • R

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software
  • Theoretical Computer Science

Fingerprint Dive into the research topics of 'High performance data mining using R on heterogeneous platforms'. Together they form a unique fingerprint.

Cite this