Abstract
Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1 + ε) under a projection onto a random O(log(k/ε)/ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1 + ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the pth power for any constant p. For k-means, our result resolves an open problem posed by Cohen et al. [STOC 2015, ACM, New York, 2015, pp. 163–172] for k-medians, it answers a question raised by Kannan.
Original language | English (US) |
---|---|
Pages (from-to) | 269-297 |
Number of pages | 29 |
Journal | SIAM Journal on Computing |
Volume | 52 |
Issue number | 2 |
DOIs | |
State | Published - Apr 2023 |
Funding
\u2217Received by the editors April 9, 2020; accepted for publication (in revised form) August 4, 2021; published electronically March 14, 2022. https://doi.org/10.1137/20M1330701 Funding: The first author was supported in part by NSF grants CCF-1955351 and HDR TRIPODS CCF-1934931. The second author was supported by NSF awards CCF-1718820, CCF-1955173, and CCF-1934843. \u2020Computer Science Department, Northwestern University, Evanston, IL 60208 USA (konstantin@ northwestern.edu). \u2021Toyota Technological Institute at Chicago, Chicago, IL 60637 USA ([email protected]). \u00A7CipherMode Labs, Newcastle, WA 98056 USA ([email protected]).
Keywords
- clustering
- dimension reduction
- k-means
ASJC Scopus subject areas
- General Computer Science
- General Mathematics