TY - JOUR
T1 - PERFORMANCE OF JOHNSON–LINDENSTRAUSS TRANSFORM FOR k-MEANS AND k-MEDIANS CLUSTERING
AU - Makarychev, Konstantin
AU - Makarychev, Yury
AU - Razenshteyn, Ilya
N1 - Funding Information:
∗Received by the editors April 9, 2020; accepted for publication (in revised form) August 4, 2021; published electronically March 14, 2022. https://doi.org/10.1137/20M1330701 Funding: The first author was supported in part by NSF grants CCF-1955351 and HDR TRIPODS CCF-1934931. The second author was supported by NSF awards CCF-1718820, CCF-1955173, and CCF-1934843. †Computer Science Department, Northwestern University, Evanston, IL 60208 USA (konstantin@ northwestern.edu). ‡Toyota Technological Institute at Chicago, Chicago, IL 60637 USA ([email protected]). §CipherMode Labs, Newcastle, WA 98056 USA ([email protected]).
Publisher Copyright:
© 2022 Society for Industrial and Applied Mathematics.
PY - 2023/4
Y1 - 2023/4
N2 - Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1 + ε) under a projection onto a random O(log(k/ε)/ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1 + ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the pth power for any constant p. For k-means, our result resolves an open problem posed by Cohen et al. [STOC 2015, ACM, New York, 2015, pp. 163–172] for k-medians, it answers a question raised by Kannan.
AB - Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1 + ε) under a projection onto a random O(log(k/ε)/ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1 + ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the pth power for any constant p. For k-means, our result resolves an open problem posed by Cohen et al. [STOC 2015, ACM, New York, 2015, pp. 163–172] for k-medians, it answers a question raised by Kannan.
KW - clustering
KW - dimension reduction
KW - k-means
UR - http://www.scopus.com/inward/record.url?scp=85159770206&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85159770206&partnerID=8YFLogxK
U2 - 10.1137/20M1330701
DO - 10.1137/20M1330701
M3 - Article
AN - SCOPUS:85159770206
SN - 0097-5397
VL - 52
SP - 269
EP - 297
JO - SIAM Journal on Computing
JF - SIAM Journal on Computing
IS - 2
ER -