Performance of Johnson–Lindenstrauss transform for k-means and k-medians clustering

Konstantin Makarychev, Yury Makarychev*, Ilya Razenshteyn

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1 + ε) under a projection onto a random O(log(k/ε)/ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1 + ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p. For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.

Original languageEnglish (US)
Title of host publicationSTOC 2019 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing
EditorsMoses Charikar, Edith Cohen
PublisherAssociation for Computing Machinery
Pages1027-1038
Number of pages12
ISBN (Electronic)9781450367059
DOIs
StatePublished - Jun 23 2019
Event51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019 - Phoenix, United States
Duration: Jun 23 2019Jun 26 2019

Publication series

NameProceedings of the Annual ACM Symposium on Theory of Computing
ISSN (Print)0737-8017

Conference

Conference51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019
CountryUnited States
CityPhoenix
Period6/23/196/26/19

Fingerprint

Costs

Keywords

  • Clustering
  • Dimension reduction
  • Johnson-Lindenstrauss transform
  • K-means
  • K-medians
  • Kirszbraun theorem

ASJC Scopus subject areas

  • Software

Cite this

Makarychev, K., Makarychev, Y., & Razenshteyn, I. (2019). Performance of Johnson–Lindenstrauss transform for k-means and k-medians clustering. In M. Charikar, & E. Cohen (Eds.), STOC 2019 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (pp. 1027-1038). (Proceedings of the Annual ACM Symposium on Theory of Computing). Association for Computing Machinery. https://doi.org/10.1145/3313276.3316350
Makarychev, Konstantin ; Makarychev, Yury ; Razenshteyn, Ilya. / Performance of Johnson–Lindenstrauss transform for k-means and k-medians clustering. STOC 2019 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing. editor / Moses Charikar ; Edith Cohen. Association for Computing Machinery, 2019. pp. 1027-1038 (Proceedings of the Annual ACM Symposium on Theory of Computing).
@inproceedings{2ca7491ed2ad41479165dc4bc0085d37,
title = "Performance of Johnson–Lindenstrauss transform for k-means and k-medians clustering",
abstract = "Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1 + ε) under a projection onto a random O(log(k/ε)/ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1 + ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p. For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.",
keywords = "Clustering, Dimension reduction, Johnson-Lindenstrauss transform, K-means, K-medians, Kirszbraun theorem",
author = "Konstantin Makarychev and Yury Makarychev and Ilya Razenshteyn",
year = "2019",
month = "6",
day = "23",
doi = "10.1145/3313276.3316350",
language = "English (US)",
series = "Proceedings of the Annual ACM Symposium on Theory of Computing",
publisher = "Association for Computing Machinery",
pages = "1027--1038",
editor = "Moses Charikar and Edith Cohen",
booktitle = "STOC 2019 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing",

}

Makarychev, K, Makarychev, Y & Razenshteyn, I 2019, Performance of Johnson–Lindenstrauss transform for k-means and k-medians clustering. in M Charikar & E Cohen (eds), STOC 2019 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing. Proceedings of the Annual ACM Symposium on Theory of Computing, Association for Computing Machinery, pp. 1027-1038, 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, United States, 6/23/19. https://doi.org/10.1145/3313276.3316350

Performance of Johnson–Lindenstrauss transform for k-means and k-medians clustering. / Makarychev, Konstantin; Makarychev, Yury; Razenshteyn, Ilya.

STOC 2019 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing. ed. / Moses Charikar; Edith Cohen. Association for Computing Machinery, 2019. p. 1027-1038 (Proceedings of the Annual ACM Symposium on Theory of Computing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Performance of Johnson–Lindenstrauss transform for k-means and k-medians clustering

AU - Makarychev, Konstantin

AU - Makarychev, Yury

AU - Razenshteyn, Ilya

PY - 2019/6/23

Y1 - 2019/6/23

N2 - Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1 + ε) under a projection onto a random O(log(k/ε)/ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1 + ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p. For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.

AB - Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1 + ε) under a projection onto a random O(log(k/ε)/ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1 + ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p. For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.

KW - Clustering

KW - Dimension reduction

KW - Johnson-Lindenstrauss transform

KW - K-means

KW - K-medians

KW - Kirszbraun theorem

UR - http://www.scopus.com/inward/record.url?scp=85068734142&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068734142&partnerID=8YFLogxK

U2 - 10.1145/3313276.3316350

DO - 10.1145/3313276.3316350

M3 - Conference contribution

AN - SCOPUS:85068734142

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 1027

EP - 1038

BT - STOC 2019 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing

A2 - Charikar, Moses

A2 - Cohen, Edith

PB - Association for Computing Machinery

ER -

Makarychev K, Makarychev Y, Razenshteyn I. Performance of Johnson–Lindenstrauss transform for k-means and k-medians clustering. In Charikar M, Cohen E, editors, STOC 2019 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing. Association for Computing Machinery. 2019. p. 1027-1038. (Proceedings of the Annual ACM Symposium on Theory of Computing). https://doi.org/10.1145/3313276.3316350