A fast high utility itemsets mining algorithm

Ying Liu*, Wei Keng Liao, Alok Choudhary

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

240 Scopus citations

Abstract

Association rule mining (ARM) identifies frequent itemsets from databases and generates association rules by considering each item in equal value. However, items are actually different in many aspects in a number of real applications, such as retail marketing, network log, etc. The difference between items makes a strong impact on the decision making in these applications. Therefore, traditional ARM cannot meet the demands arising from these applications. By considering the different values of individual items as utilities, utility mining focuses on identifying the itemsets with high utilities. As "downward closure property" doesn't apply to utility mining, the generation of candidate itemsets is the most costly in terms of time and memory space. In this paper, we present a Two-Phase algorithm to efficiently prune down the number of candidates and can precisely obtain the complete set of high utility itemsets. In the first phase, we propose a model that applies the "transaction-weighted downward closure property" on the search space to expedite the identification of candidates. In the second phase, one extra database scan is performed to identify the high utility itemsets. We also parallelize our algorithm on shared memory multi-process architecture using Common Count Partitioned Database (CCPD) strategy. We verify our algorithm by applying it to both synthetic and real databases. It performs very efficiently in terms of speed and memory cost, and shows good scalability on multiple processors, even on large databases that are difficult for existing algorithms to handle.

Original languageEnglish (US)
Title of host publicationProceedings of the 1st International Workshop on Utility-Based Data Mining, UBDM '05
Pages90-99
Number of pages10
DOIs
StatePublished - Dec 1 2005
Event1st International Workshop on Utility-Based Data Mining, UBDM '05 - Chicago, IL, United States
Duration: Aug 21 2005Aug 21 2005

Publication series

NameProceedings of the 1st International Workshop on Utility-Based Data Mining, UBDM '05

Other

Other1st International Workshop on Utility-Based Data Mining, UBDM '05
CountryUnited States
CityChicago, IL
Period8/21/058/21/05

Keywords

  • association rules mining
  • downward closure property
  • transaction-weighted utilization
  • utility mining

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications

Fingerprint Dive into the research topics of 'A fast high utility itemsets mining algorithm'. Together they form a unique fingerprint.

Cite this