High performance big data clustering

Ankit Agrawal, Md Mostofa Ali Patwary, William Hendrix, Wei Keng Liao, Alok Choudhary

Research output: Chapter in Book/Report/Conference proceedingChapter

  • 4 Citations

Abstract

Scientific advances are collectively exploding the amount, diversity, and complexity of data becoming available. Our ability to collect huge amounts of data has greatly surpassed our analytical capacity to make sense of it. Efficient use of high performance computing techniques is critical for the success of the data-driven paradigm to scientific discovery. Data clustering is one of the fundamental analytics tasks heavily relied upon in many application domains, like astrohpysics, climate science, bioinformatics, etc. In this book chapter, we illustrate the challenges and opportunities in mining big data using two recently developed scalable parallel clustering algorithms. Experimental results on millions of high-dimensional data points clustered in parallel on thousands of processor cores are also presented.

Original languageEnglish
Title of host publicationCloud Computing and Big Data
PublisherIOS Press BV
Pages192-211
Number of pages20
Volume23
ISBN (Print)9781614993216
DOIs
StatePublished - 2013

Publication series

NameAdvances in Parallel Computing
Volume23
ISSN (Print)09275452

Fingerprint

Big data
Bioinformatics
Parallel algorithms
Clustering algorithms

Keywords

  • big data
  • clustering
  • density-based clustering
  • hierarchical clustering

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Agrawal, A., Patwary, M. M. A., Hendrix, W., Liao, W. K., & Choudhary, A. (2013). High performance big data clustering. In Cloud Computing and Big Data (Vol. 23, pp. 192-211). (Advances in Parallel Computing; Vol. 23). IOS Press BV. DOI: 10.3233/978-1-61499-322-3-192

High performance big data clustering. / Agrawal, Ankit; Patwary, Md Mostofa Ali; Hendrix, William; Liao, Wei Keng; Choudhary, Alok.

Cloud Computing and Big Data. Vol. 23 IOS Press BV, 2013. p. 192-211 (Advances in Parallel Computing; Vol. 23).

Research output: Chapter in Book/Report/Conference proceedingChapter

Agrawal, A, Patwary, MMA, Hendrix, W, Liao, WK & Choudhary, A 2013, High performance big data clustering. in Cloud Computing and Big Data. vol. 23, Advances in Parallel Computing, vol. 23, IOS Press BV, pp. 192-211. DOI: 10.3233/978-1-61499-322-3-192
Agrawal A, Patwary MMA, Hendrix W, Liao WK, Choudhary A. High performance big data clustering. In Cloud Computing and Big Data. Vol. 23. IOS Press BV. 2013. p. 192-211. (Advances in Parallel Computing). Available from, DOI: 10.3233/978-1-61499-322-3-192

Agrawal, Ankit; Patwary, Md Mostofa Ali; Hendrix, William; Liao, Wei Keng; Choudhary, Alok / High performance big data clustering.

Cloud Computing and Big Data. Vol. 23 IOS Press BV, 2013. p. 192-211 (Advances in Parallel Computing; Vol. 23).

Research output: Chapter in Book/Report/Conference proceedingChapter

@inbook{51023385bbca415f9274d9dd9c987835,
title = "High performance big data clustering",
abstract = "Scientific advances are collectively exploding the amount, diversity, and complexity of data becoming available. Our ability to collect huge amounts of data has greatly surpassed our analytical capacity to make sense of it. Efficient use of high performance computing techniques is critical for the success of the data-driven paradigm to scientific discovery. Data clustering is one of the fundamental analytics tasks heavily relied upon in many application domains, like astrohpysics, climate science, bioinformatics, etc. In this book chapter, we illustrate the challenges and opportunities in mining big data using two recently developed scalable parallel clustering algorithms. Experimental results on millions of high-dimensional data points clustered in parallel on thousands of processor cores are also presented.",
keywords = "big data, clustering, density-based clustering, hierarchical clustering",
author = "Ankit Agrawal and Patwary, {Md Mostofa Ali} and William Hendrix and Liao, {Wei Keng} and Alok Choudhary",
year = "2013",
doi = "10.3233/978-1-61499-322-3-192",
isbn = "9781614993216",
volume = "23",
series = "Advances in Parallel Computing",
publisher = "IOS Press BV",
pages = "192--211",
booktitle = "Cloud Computing and Big Data",

}

TY - CHAP

T1 - High performance big data clustering

AU - Agrawal,Ankit

AU - Patwary,Md Mostofa Ali

AU - Hendrix,William

AU - Liao,Wei Keng

AU - Choudhary,Alok

PY - 2013

Y1 - 2013

N2 - Scientific advances are collectively exploding the amount, diversity, and complexity of data becoming available. Our ability to collect huge amounts of data has greatly surpassed our analytical capacity to make sense of it. Efficient use of high performance computing techniques is critical for the success of the data-driven paradigm to scientific discovery. Data clustering is one of the fundamental analytics tasks heavily relied upon in many application domains, like astrohpysics, climate science, bioinformatics, etc. In this book chapter, we illustrate the challenges and opportunities in mining big data using two recently developed scalable parallel clustering algorithms. Experimental results on millions of high-dimensional data points clustered in parallel on thousands of processor cores are also presented.

AB - Scientific advances are collectively exploding the amount, diversity, and complexity of data becoming available. Our ability to collect huge amounts of data has greatly surpassed our analytical capacity to make sense of it. Efficient use of high performance computing techniques is critical for the success of the data-driven paradigm to scientific discovery. Data clustering is one of the fundamental analytics tasks heavily relied upon in many application domains, like astrohpysics, climate science, bioinformatics, etc. In this book chapter, we illustrate the challenges and opportunities in mining big data using two recently developed scalable parallel clustering algorithms. Experimental results on millions of high-dimensional data points clustered in parallel on thousands of processor cores are also presented.

KW - big data

KW - clustering

KW - density-based clustering

KW - hierarchical clustering

UR - http://www.scopus.com/inward/record.url?scp=84895107082&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84895107082&partnerID=8YFLogxK

U2 - 10.3233/978-1-61499-322-3-192

DO - 10.3233/978-1-61499-322-3-192

M3 - Chapter

SN - 9781614993216

VL - 23

T3 - Advances in Parallel Computing

SP - 192

EP - 211

BT - Cloud Computing and Big Data

PB - IOS Press BV

ER -