Evaluation of K-means data clustering algorithm on Intel Xeon Phi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Intel Xeon Phi is a processor based on MIC architecture that contains a large number of compute cores with a high local memory bandwidth and 512-bit vector processing units. To achieve high performance on Xeon Phi, it is important for programmers to explore all the software features provided by the Intel compiler and libraries to fully utilize the new hardware resources. In this paper, we use the K-Means algorithm to study the performance of various Intel software settings available for Xeon Phi and their impacts to the performance of K-means. At first we examine different memory layouts for storing data points using Intel compiler-intrinsic functions. During distance calculation, the computational kernel of K-means, when the size of individual input data points is not vector-friendly, we pad the data points to align with the VPU width. At last, we implement a parallel reduction to increase memory access parallelism and cache hits. These techniques enable us to successfully take advantage of thread-level parallelism and data-level parallelism on Xeon Phi. Experimental results demonstrate large performance gains over the default auto-vectorization approach. The K-Means implemented with the proposed techniques achieves up to 68.65% and 56.14% performance improvements for aligned datasets and unaligned datasets, respectively. For high-dimensional aligned datasets, we achieved up to 53.49% performance improvement on a large-scale parallel computer.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
EditorsRonay Ak, George Karypis, Yinglong Xia, Xiaohua Tony Hu, Philip S. Yu, James Joshi, Lyle Ungar, Ling Liu, Aki-Hiro Sato, Toyotaro Suzumura, Sudarsan Rachuri, Rama Govindaraju, Weijia Xu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2251-2260
Number of pages10
ISBN (Electronic)9781467390040
DOIs
StatePublished - Jan 1 2016
Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
Duration: Dec 5 2016Dec 8 2016

Publication series

NameProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

Other

Other4th IEEE International Conference on Big Data, Big Data 2016
CountryUnited States
CityWashington
Period12/5/1612/8/16

Keywords

  • Accelerator
  • Data Clustering
  • K-Means
  • Parallelization
  • Xeon Phi

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Hardware and Architecture

Fingerprint Dive into the research topics of 'Evaluation of K-means data clustering algorithm on Intel Xeon Phi'. Together they form a unique fingerprint.

  • Cite this

    Lee, S., Liao, W-K., Agrawal, A., Hardavellas, N., & Choudhary, A. N. (2016). Evaluation of K-means data clustering algorithm on Intel Xeon Phi. In R. Ak, G. Karypis, Y. Xia, X. T. Hu, P. S. Yu, J. Joshi, L. Ungar, L. Liu, A-H. Sato, T. Suzumura, S. Rachuri, R. Govindaraju, & W. Xu (Eds.), Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 (pp. 2251-2260). [7840856] (Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2016.7840856