Distributed Web Log mining Using Maximal Large Itemsets

Mehmet Sayal, P. Scheuermann

Research output: Contribution to journalArticlepeer-review

Abstract

We introduce a partitioning-based distributed document-clustering algorithm using user access patterns from multi-server web sites. Our algorithm makes it possible to exploit simultaneously adaptive document replication and persistent connections, two techniques that are most effective in decreasing the response time that is observed by web users. The algorithm first distributes the user access data evenly among the servers by using a hash function. Then, each server generates a local clustering on its fair share of the user sessions records by employing a traditional single-machine document-clustering algorithm. Finally, those local clustering results are combined together by using a novel procedure that generates maximal large itemsets of web documents. We present preliminary experimental results and discuss alternative approaches to be pursued in the future.
Original languageEnglish
Pages (from-to)389-404
JournalKnowledge and Information Systems
Volume2
DOIs
StatePublished - 2001

Fingerprint Dive into the research topics of 'Distributed Web Log mining Using Maximal Large Itemsets'. Together they form a unique fingerprint.

Cite this