High performance big data clustering

Ankit Agrawal*, Md Mostofa Ali Patwary, William Hendrix, Wei Keng Liao, Alok Choudhary

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapter

7 Scopus citations

Abstract

Scientific advances are collectively exploding the amount, diversity, and complexity of data becoming available. Our ability to collect huge amounts of data has greatly surpassed our analytical capacity to make sense of it. Efficient use of high performance computing techniques is critical for the success of the data-driven paradigm to scientific discovery. Data clustering is one of the fundamental analytics tasks heavily relied upon in many application domains, like astrohpysics, climate science, bioinformatics, etc. In this book chapter, we illustrate the challenges and opportunities in mining big data using two recently developed scalable parallel clustering algorithms. Experimental results on millions of high-dimensional data points clustered in parallel on thousands of processor cores are also presented.

Original languageEnglish (US)
Title of host publicationCloud Computing and Big Data
PublisherIOS Press BV
Pages192-211
Number of pages20
ISBN (Print)9781614993216
DOIs
StatePublished - 2013

Publication series

NameAdvances in Parallel Computing
Volume23
ISSN (Print)0927-5452

Keywords

  • big data
  • clustering
  • density-based clustering
  • hierarchical clustering

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'High performance big data clustering'. Together they form a unique fingerprint.

Cite this