Evaluation of K-means data clustering algorithm on Intel Xeon Phi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Intel Xeon Phi is a processor based on MIC architecture that contains a large number of compute cores with a high local memory bandwidth and 512-bit vector processing units. To achieve high performance on Xeon Phi, it is important for programmers to explore all the software features provided by the Intel compiler and libraries to fully utilize the new hardware resources. In this paper, we use the K-Means algorithm to study the performance of various Intel software settings available for Xeon Phi and their impacts to the performance of K-means. At first we examine different memory layouts for storing data points using Intel compiler-intrinsic functions. During distance calculation, the computational kernel of K-means, when the size of individual input data points is not vector-friendly, we pad the data points to align with the VPU width. At last, we implement a parallel reduction to increase memory access parallelism and cache hits. These techniques enable us to successfully take advantage of thread-level parallelism and data-level parallelism on Xeon Phi. Experimental results demonstrate large performance gains over the default auto-vectorization approach. The K-Means implemented with the proposed techniques achieves up to 68.65% and 56.14% performance improvements for aligned datasets and unaligned datasets, respectively. For high-dimensional aligned datasets, we achieved up to 53.49% performance improvement on a large-scale parallel computer.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
EditorsRonay Ak, George Karypis, Yinglong Xia, Xiaohua Tony Hu, Philip S. Yu, James Joshi, Lyle Ungar, Ling Liu, Aki-Hiro Sato, Toyotaro Suzumura, Sudarsan Rachuri, Rama Govindaraju, Weijia Xu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2251-2260
Number of pages10
ISBN (Electronic)9781467390040
DOIs
StatePublished - Jan 1 2016
Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
Duration: Dec 5 2016Dec 8 2016

Publication series

NameProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

Other

Other4th IEEE International Conference on Big Data, Big Data 2016
CountryUnited States
CityWashington
Period12/5/1612/8/16

Fingerprint

Clustering algorithms
Data storage equipment
Hardware
Bandwidth
Processing

Keywords

  • Accelerator
  • Data Clustering
  • K-Means
  • Parallelization
  • Xeon Phi

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Hardware and Architecture

Cite this

Lee, S., Liao, W-K., Agrawal, A., Hardavellas, N., & Choudhary, A. N. (2016). Evaluation of K-means data clustering algorithm on Intel Xeon Phi. In R. Ak, G. Karypis, Y. Xia, X. T. Hu, P. S. Yu, J. Joshi, L. Ungar, L. Liu, A-H. Sato, T. Suzumura, S. Rachuri, R. Govindaraju, ... W. Xu (Eds.), Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 (pp. 2251-2260). [7840856] (Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2016.7840856
Lee, Sunwoo ; Liao, Wei-Keng ; Agrawal, Ankit ; Hardavellas, Nikos ; Choudhary, Alok Nidhi. / Evaluation of K-means data clustering algorithm on Intel Xeon Phi. Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. editor / Ronay Ak ; George Karypis ; Yinglong Xia ; Xiaohua Tony Hu ; Philip S. Yu ; James Joshi ; Lyle Ungar ; Ling Liu ; Aki-Hiro Sato ; Toyotaro Suzumura ; Sudarsan Rachuri ; Rama Govindaraju ; Weijia Xu. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 2251-2260 (Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016).
@inproceedings{d64bdadc30174236973b4f9d203d30e1,
title = "Evaluation of K-means data clustering algorithm on Intel Xeon Phi",
abstract = "Intel Xeon Phi is a processor based on MIC architecture that contains a large number of compute cores with a high local memory bandwidth and 512-bit vector processing units. To achieve high performance on Xeon Phi, it is important for programmers to explore all the software features provided by the Intel compiler and libraries to fully utilize the new hardware resources. In this paper, we use the K-Means algorithm to study the performance of various Intel software settings available for Xeon Phi and their impacts to the performance of K-means. At first we examine different memory layouts for storing data points using Intel compiler-intrinsic functions. During distance calculation, the computational kernel of K-means, when the size of individual input data points is not vector-friendly, we pad the data points to align with the VPU width. At last, we implement a parallel reduction to increase memory access parallelism and cache hits. These techniques enable us to successfully take advantage of thread-level parallelism and data-level parallelism on Xeon Phi. Experimental results demonstrate large performance gains over the default auto-vectorization approach. The K-Means implemented with the proposed techniques achieves up to 68.65{\%} and 56.14{\%} performance improvements for aligned datasets and unaligned datasets, respectively. For high-dimensional aligned datasets, we achieved up to 53.49{\%} performance improvement on a large-scale parallel computer.",
keywords = "Accelerator, Data Clustering, K-Means, Parallelization, Xeon Phi",
author = "Sunwoo Lee and Wei-Keng Liao and Ankit Agrawal and Nikos Hardavellas and Choudhary, {Alok Nidhi}",
year = "2016",
month = "1",
day = "1",
doi = "10.1109/BigData.2016.7840856",
language = "English (US)",
series = "Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "2251--2260",
editor = "Ronay Ak and George Karypis and Yinglong Xia and Hu, {Xiaohua Tony} and Yu, {Philip S.} and James Joshi and Lyle Ungar and Ling Liu and Aki-Hiro Sato and Toyotaro Suzumura and Sudarsan Rachuri and Rama Govindaraju and Weijia Xu",
booktitle = "Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016",
address = "United States",

}

Lee, S, Liao, W-K, Agrawal, A, Hardavellas, N & Choudhary, AN 2016, Evaluation of K-means data clustering algorithm on Intel Xeon Phi. in R Ak, G Karypis, Y Xia, XT Hu, PS Yu, J Joshi, L Ungar, L Liu, A-H Sato, T Suzumura, S Rachuri, R Govindaraju & W Xu (eds), Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016., 7840856, Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016, Institute of Electrical and Electronics Engineers Inc., pp. 2251-2260, 4th IEEE International Conference on Big Data, Big Data 2016, Washington, United States, 12/5/16. https://doi.org/10.1109/BigData.2016.7840856

Evaluation of K-means data clustering algorithm on Intel Xeon Phi. / Lee, Sunwoo; Liao, Wei-Keng; Agrawal, Ankit; Hardavellas, Nikos; Choudhary, Alok Nidhi.

Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. ed. / Ronay Ak; George Karypis; Yinglong Xia; Xiaohua Tony Hu; Philip S. Yu; James Joshi; Lyle Ungar; Ling Liu; Aki-Hiro Sato; Toyotaro Suzumura; Sudarsan Rachuri; Rama Govindaraju; Weijia Xu. Institute of Electrical and Electronics Engineers Inc., 2016. p. 2251-2260 7840856 (Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Evaluation of K-means data clustering algorithm on Intel Xeon Phi

AU - Lee, Sunwoo

AU - Liao, Wei-Keng

AU - Agrawal, Ankit

AU - Hardavellas, Nikos

AU - Choudhary, Alok Nidhi

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Intel Xeon Phi is a processor based on MIC architecture that contains a large number of compute cores with a high local memory bandwidth and 512-bit vector processing units. To achieve high performance on Xeon Phi, it is important for programmers to explore all the software features provided by the Intel compiler and libraries to fully utilize the new hardware resources. In this paper, we use the K-Means algorithm to study the performance of various Intel software settings available for Xeon Phi and their impacts to the performance of K-means. At first we examine different memory layouts for storing data points using Intel compiler-intrinsic functions. During distance calculation, the computational kernel of K-means, when the size of individual input data points is not vector-friendly, we pad the data points to align with the VPU width. At last, we implement a parallel reduction to increase memory access parallelism and cache hits. These techniques enable us to successfully take advantage of thread-level parallelism and data-level parallelism on Xeon Phi. Experimental results demonstrate large performance gains over the default auto-vectorization approach. The K-Means implemented with the proposed techniques achieves up to 68.65% and 56.14% performance improvements for aligned datasets and unaligned datasets, respectively. For high-dimensional aligned datasets, we achieved up to 53.49% performance improvement on a large-scale parallel computer.

AB - Intel Xeon Phi is a processor based on MIC architecture that contains a large number of compute cores with a high local memory bandwidth and 512-bit vector processing units. To achieve high performance on Xeon Phi, it is important for programmers to explore all the software features provided by the Intel compiler and libraries to fully utilize the new hardware resources. In this paper, we use the K-Means algorithm to study the performance of various Intel software settings available for Xeon Phi and their impacts to the performance of K-means. At first we examine different memory layouts for storing data points using Intel compiler-intrinsic functions. During distance calculation, the computational kernel of K-means, when the size of individual input data points is not vector-friendly, we pad the data points to align with the VPU width. At last, we implement a parallel reduction to increase memory access parallelism and cache hits. These techniques enable us to successfully take advantage of thread-level parallelism and data-level parallelism on Xeon Phi. Experimental results demonstrate large performance gains over the default auto-vectorization approach. The K-Means implemented with the proposed techniques achieves up to 68.65% and 56.14% performance improvements for aligned datasets and unaligned datasets, respectively. For high-dimensional aligned datasets, we achieved up to 53.49% performance improvement on a large-scale parallel computer.

KW - Accelerator

KW - Data Clustering

KW - K-Means

KW - Parallelization

KW - Xeon Phi

UR - http://www.scopus.com/inward/record.url?scp=85015224691&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015224691&partnerID=8YFLogxK

U2 - 10.1109/BigData.2016.7840856

DO - 10.1109/BigData.2016.7840856

M3 - Conference contribution

AN - SCOPUS:85015224691

T3 - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

SP - 2251

EP - 2260

BT - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

A2 - Ak, Ronay

A2 - Karypis, George

A2 - Xia, Yinglong

A2 - Hu, Xiaohua Tony

A2 - Yu, Philip S.

A2 - Joshi, James

A2 - Ungar, Lyle

A2 - Liu, Ling

A2 - Sato, Aki-Hiro

A2 - Suzumura, Toyotaro

A2 - Rachuri, Sudarsan

A2 - Govindaraju, Rama

A2 - Xu, Weijia

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Lee S, Liao W-K, Agrawal A, Hardavellas N, Choudhary AN. Evaluation of K-means data clustering algorithm on Intel Xeon Phi. In Ak R, Karypis G, Xia Y, Hu XT, Yu PS, Joshi J, Ungar L, Liu L, Sato A-H, Suzumura T, Rachuri S, Govindaraju R, Xu W, editors, Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 2251-2260. 7840856. (Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016). https://doi.org/10.1109/BigData.2016.7840856