PERFORMANCE OF JOHNSON–LINDENSTRAUSS TRANSFORM FOR k-MEANS AND k-MEDIANS CLUSTERING

Konstantin Makarychev, Yury Makarychev, Ilya Razenshteyn

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1 + ε) under a projection onto a random O(log(k/ε)/ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1 + ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the pth power for any constant p. For k-means, our result resolves an open problem posed by Cohen et al. [STOC 2015, ACM, New York, 2015, pp. 163–172] for k-medians, it answers a question raised by Kannan.

Original languageEnglish (US)
Pages (from-to)269-297
Number of pages29
JournalSIAM Journal on Computing
Volume52
Issue number2
DOIs
StatePublished - Apr 2023

Keywords

  • clustering
  • dimension reduction
  • k-means

ASJC Scopus subject areas

  • General Computer Science
  • General Mathematics

Fingerprint

Dive into the research topics of 'PERFORMANCE OF JOHNSON–LINDENSTRAUSS TRANSFORM FOR k-MEANS AND k-MEDIANS CLUSTERING'. Together they form a unique fingerprint.

Cite this