pPOP: Fast yet accurate parallel hierarchical clustering using partitioning

Manoranjan Dash*, Simona Petrutiu, Peter Scheuermann

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

27 Scopus citations

Abstract

Hierarchical agglomerative clustering (HAC) is very useful but due to high CPU time and memory complexity its practical use is limited. Earlier, we proposed an efficient partitioning - partially overlapping partitioning (POP) - based on the fact that in HAC small and closely placed clusters are agglomerated initially, and only towards the end larger and distant clusters are agglomerated. Here, we present the parallel version of POP, pPOP. Theoretical analysis shows that, compared to the existing algorithms, pPOP achieves CPU time speed-up and memory scale-down of O(c) without compromising accuracy where c is the number of cells in the partition. A shared memory implementation shows that pPOP outperforms existing algorithms significantly.

Original languageEnglish (US)
Pages (from-to)563-578
Number of pages16
JournalData and Knowledge Engineering
Volume61
Issue number3
DOIs
StatePublished - Jun 2007

Keywords

  • Hierarchical agglomerative clustering
  • Parallel algorithm
  • Partitioning
  • Shared memory multiprocessor architecture

ASJC Scopus subject areas

  • Information Systems and Management

Fingerprint

Dive into the research topics of 'pPOP: Fast yet accurate parallel hierarchical clustering using partitioning'. Together they form a unique fingerprint.

Cite this