Parallel data mining algorithms for association rules and clustering

Jianwei Li, Wei Keng Liao, Alok Choudhary, Ying Liu

Research output: Chapter in Book/Report/Conference proceedingChapter

5 Scopus citations

Abstract

Volumes of data are exploding in both scientific and commercial domains. Data mining techniques that extract information from huge amount of data have become popular in many applications. Algorithms are designed to analyze those volumes of data automatically in efficient ways, so that users can grasp the intrinsic knowledge latent in the data without the need to manually look through the massive data itself. However, the performance of computer systems is improving at a slower rate compared to the increase in the demand for data mining applications. Recent trends suggest that the system performance has been improving at a rate of 10–15% per year, whereas the volume of data collected nearly doubles every year. As the data sizes increase, from gigabytes to terabytes or even larger, sequential data mining algorithms may not deliver results in a reasonable amount of time. Even worse, as a single processor alone may not have enough main memory to hold all the data, a lot of sequential algorithms could not handle large-scale problems or have to process data out of core, further slowing down the process.

Original languageEnglish (US)
Title of host publicationHandbook of Parallel Computing
Subtitle of host publicationModels, Algorithms and Applications
PublisherCRC Press
Pages32-1-32-20
ISBN (Electronic)9781420011296
ISBN (Print)9781584886235
DOIs
StatePublished - Jan 1 2007

ASJC Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

Fingerprint

Dive into the research topics of 'Parallel data mining algorithms for association rules and clustering'. Together they form a unique fingerprint.

Cite this