Performance evaluation and characterization of scalable data mining algorithms

Ying Liu*, Jayaprakash Pisharath, Wei-Keng Liao, Gokhan Memik, Alok Nidhi Choudhary, Pradeep Dubey

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

10 Scopus citations


Data mining has become one of the most essential tools in diverse fields. The increases in data sizes and algorithmic complexities require the computational power of chip to increase even further. In this paper, we present detailed characteristics from the hardware and software perspectives for a set of representative data mining programs. We first design MineBench, a benchmarking suite containing representative data mining applications from multiple categories including two classification, two association rule mining, and four clustering applications. We evaluate the MineBench applications on an 8-way Shared Memory Parallel (SMP) machine and analyze their important performance characteristics. During the evaluation, the input datasets and the number of processors used are varied to measure the scalability of the applications in our benchmark suite. We present the results based on characteristics such as scalability, I/O complexity, fraction of time spent in the OS mode, and communication/synchronization overheads. This information can aid designers of future systems as well as programmers of new data mining algorithms to achieve better system and algorithmic performance.

Original languageEnglish (US)
Article number439-213
Pages (from-to)620-625
Number of pages6
JournalProceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems
StatePublished - 2004
EventProceedings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems - Cambridge, MA, United States
Duration: Nov 9 2004Nov 11 2004


  • Data mining, benchmark
  • Parallel computing
  • Performance evaluation

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications


Dive into the research topics of 'Performance evaluation and characterization of scalable data mining algorithms'. Together they form a unique fingerprint.

Cite this