A scalable algorithm for single-linkage hierarchical clustering on distributed-memory architectures

William Hendrix, Diana Palsetia, Md Mostofa Ali Patwary, Ankit Agrawal, Wei-Keng Liao, Alok Nidhi Choudhary

Research output: Contribution to conferencePaper

11 Scopus citations

Abstract

Hierarchical clustering is a fundamental and widely-used clustering algorithm with many advantages over traditional partitional clustering. Due to the explosion in size of modern scientific datasets, there is a pressing need for scalable analytics algorithms, but good scaling is difficult to achieve for hierarchical clustering due to data dependencies inherent in the algorithm. To the best of our knowledge, no previous work on parallel hierarchical clustering has shown scalability beyond a couple hundred processes. In this paper, we present PINK, a scalable parallel algorithm for single-linkage hierarchical clustering based on decomposing a problem instance into two different types of subproblems. Despite the heterogeneous workloads, our algorithm exhibits good load balancing, as well as low memory requirements and a communication pattern that is both low-volume and deterministic. Evaluating PINK on up to 6050 processes, we find that it achieves speedups up to approximately 6600.

Original languageEnglish (US)
Pages7-13
Number of pages7
DOIs
StatePublished - Jan 1 2013
Event2013 3rd IEEE Symposium on Large-Scale Data Analysis and Visualization, LDAV 2013 - Atlanta, GA, United States
Duration: Oct 13 2013Oct 14 2013

Other

Other2013 3rd IEEE Symposium on Large-Scale Data Analysis and Visualization, LDAV 2013
CountryUnited States
CityAtlanta, GA
Period10/13/1310/14/13

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'A scalable algorithm for single-linkage hierarchical clustering on distributed-memory architectures'. Together they form a unique fingerprint.

  • Cite this

    Hendrix, W., Palsetia, D., Patwary, M. M. A., Agrawal, A., Liao, W-K., & Choudhary, A. N. (2013). A scalable algorithm for single-linkage hierarchical clustering on distributed-memory architectures. 7-13. Paper presented at 2013 3rd IEEE Symposium on Large-Scale Data Analysis and Visualization, LDAV 2013, Atlanta, GA, United States. https://doi.org/10.1109/LDAV.2013.6675153