TY - GEN
T1 - A fast high utility itemsets mining algorithm
AU - Liu, Ying
AU - Liao, Wei Keng
AU - Choudhary, Alok
PY - 2005
Y1 - 2005
N2 - Association rule mining (ARM) identifies frequent itemsets from databases and generates association rules by considering each item in equal value. However, items are actually different in many aspects in a number of real applications, such as retail marketing, network log, etc. The difference between items makes a strong impact on the decision making in these applications. Therefore, traditional ARM cannot meet the demands arising from these applications. By considering the different values of individual items as utilities, utility mining focuses on identifying the itemsets with high utilities. As "downward closure property" doesn't apply to utility mining, the generation of candidate itemsets is the most costly in terms of time and memory space. In this paper, we present a Two-Phase algorithm to efficiently prune down the number of candidates and can precisely obtain the complete set of high utility itemsets. In the first phase, we propose a model that applies the "transaction-weighted downward closure property" on the search space to expedite the identification of candidates. In the second phase, one extra database scan is performed to identify the high utility itemsets. We also parallelize our algorithm on shared memory multi-process architecture using Common Count Partitioned Database (CCPD) strategy. We verify our algorithm by applying it to both synthetic and real databases. It performs very efficiently in terms of speed and memory cost, and shows good scalability on multiple processors, even on large databases that are difficult for existing algorithms to handle.
AB - Association rule mining (ARM) identifies frequent itemsets from databases and generates association rules by considering each item in equal value. However, items are actually different in many aspects in a number of real applications, such as retail marketing, network log, etc. The difference between items makes a strong impact on the decision making in these applications. Therefore, traditional ARM cannot meet the demands arising from these applications. By considering the different values of individual items as utilities, utility mining focuses on identifying the itemsets with high utilities. As "downward closure property" doesn't apply to utility mining, the generation of candidate itemsets is the most costly in terms of time and memory space. In this paper, we present a Two-Phase algorithm to efficiently prune down the number of candidates and can precisely obtain the complete set of high utility itemsets. In the first phase, we propose a model that applies the "transaction-weighted downward closure property" on the search space to expedite the identification of candidates. In the second phase, one extra database scan is performed to identify the high utility itemsets. We also parallelize our algorithm on shared memory multi-process architecture using Common Count Partitioned Database (CCPD) strategy. We verify our algorithm by applying it to both synthetic and real databases. It performs very efficiently in terms of speed and memory cost, and shows good scalability on multiple processors, even on large databases that are difficult for existing algorithms to handle.
KW - association rules mining
KW - downward closure property
KW - transaction-weighted utilization
KW - utility mining
UR - http://www.scopus.com/inward/record.url?scp=77953605998&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77953605998&partnerID=8YFLogxK
U2 - 10.1145/1089827.1089839
DO - 10.1145/1089827.1089839
M3 - Conference contribution
AN - SCOPUS:77953605998
SN - 1595932089
SN - 9781595932082
T3 - Proceedings of the 1st International Workshop on Utility-Based Data Mining, UBDM '05
SP - 90
EP - 99
BT - Proceedings of the 1st International Workshop on Utility-Based Data Mining, UBDM '05
T2 - 1st International Workshop on Utility-Based Data Mining, UBDM '05
Y2 - 21 August 2005 through 21 August 2005
ER -