### Abstract

Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1 + ε) under a projection onto a random O(log(k/ε)/ε^{2})-dimensional subspace. Further, the cost of every clustering is preserved within (1 + ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p. For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.

Original language | English (US) |
---|---|

Title of host publication | STOC 2019 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing |

Editors | Moses Charikar, Edith Cohen |

Publisher | Association for Computing Machinery |

Pages | 1027-1038 |

Number of pages | 12 |

ISBN (Electronic) | 9781450367059 |

DOIs | |

State | Published - Jun 23 2019 |

Event | 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019 - Phoenix, United States Duration: Jun 23 2019 → Jun 26 2019 |

### Publication series

Name | Proceedings of the Annual ACM Symposium on Theory of Computing |
---|---|

ISSN (Print) | 0737-8017 |

### Conference

Conference | 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019 |
---|---|

Country | United States |

City | Phoenix |

Period | 6/23/19 → 6/26/19 |

### Fingerprint

### Keywords

- Clustering
- Dimension reduction
- Johnson-Lindenstrauss transform
- K-means
- K-medians
- Kirszbraun theorem

### ASJC Scopus subject areas

- Software

### Cite this

*STOC 2019 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing*(pp. 1027-1038). (Proceedings of the Annual ACM Symposium on Theory of Computing). Association for Computing Machinery. https://doi.org/10.1145/3313276.3316350

}

*STOC 2019 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing.*Proceedings of the Annual ACM Symposium on Theory of Computing, Association for Computing Machinery, pp. 1027-1038, 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, United States, 6/23/19. https://doi.org/10.1145/3313276.3316350

**Performance of Johnson–Lindenstrauss transform for k-means and k-medians clustering.** / Makarychev, Konstantin; Makarychev, Yury; Razenshteyn, Ilya.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Performance of Johnson–Lindenstrauss transform for k-means and k-medians clustering

AU - Makarychev, Konstantin

AU - Makarychev, Yury

AU - Razenshteyn, Ilya

PY - 2019/6/23

Y1 - 2019/6/23

N2 - Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1 + ε) under a projection onto a random O(log(k/ε)/ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1 + ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p. For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.

AB - Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1 + ε) under a projection onto a random O(log(k/ε)/ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1 + ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p. For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.

KW - Clustering

KW - Dimension reduction

KW - Johnson-Lindenstrauss transform

KW - K-means

KW - K-medians

KW - Kirszbraun theorem

UR - http://www.scopus.com/inward/record.url?scp=85068734142&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068734142&partnerID=8YFLogxK

U2 - 10.1145/3313276.3316350

DO - 10.1145/3313276.3316350

M3 - Conference contribution

AN - SCOPUS:85068734142

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 1027

EP - 1038

BT - STOC 2019 - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing

A2 - Charikar, Moses

A2 - Cohen, Edith

PB - Association for Computing Machinery

ER -