Abstract
Data mining has become one of the most essential tools in diverse fields. The increases in data sizes and algorithmic complexities require the computational power of chip to increase even further. In this paper, we present detailed characteristics from the hardware and software perspectives for a set of representative data mining programs. We first design MineBench, a benchmarking suite containing representative data mining applications from multiple categories including two classification, two association rule mining, and four clustering applications. We evaluate the MineBench applications on an 8-way Shared Memory Parallel (SMP) machine and analyze their important performance characteristics. During the evaluation, the input datasets and the number of processors used are varied to measure the scalability of the applications in our benchmark suite. We present the results based on characteristics such as scalability, I/O complexity, fraction of time spent in the OS mode, and communication/synchronization overheads. This information can aid designers of future systems as well as programmers of new data mining algorithms to achieve better system and algorithmic performance.
Original language | English (US) |
---|---|
Article number | 439-213 |
Pages (from-to) | 620-625 |
Number of pages | 6 |
Journal | Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems |
Volume | 16 |
State | Published - 2004 |
Event | Proceedings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems - Cambridge, MA, United States Duration: Nov 9 2004 → Nov 11 2004 |
Keywords
- Data mining, benchmark
- Parallel computing
- Performance evaluation
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Computer Networks and Communications