### Abstract

Gaussian mixture models (GMM) are the most widely used statistical model for the fc-means clustering problem and form a popular framework for clustering in machinc learning and data analysis. In this paper, we propose a natural robust model for fc-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd's algorithm for fc-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any A:-means clustering algorithm on the semi-random model.

Original language | English (US) |
---|---|

Title of host publication | 35th International Conference on Machine Learning, ICML 2018 |

Editors | Andreas Krause, Jennifer Dy |

Publisher | International Machine Learning Society (IMLS) |

Pages | 469-494 |

Number of pages | 26 |

Volume | 1 |

ISBN (Electronic) | 9781510867963 |

State | Published - Jan 1 2018 |

Event | 35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden Duration: Jul 10 2018 → Jul 15 2018 |

### Other

Other | 35th International Conference on Machine Learning, ICML 2018 |
---|---|

Country | Sweden |

City | Stockholm |

Period | 7/10/18 → 7/15/18 |

### Fingerprint

### ASJC Scopus subject areas

- Computational Theory and Mathematics
- Human-Computer Interaction
- Software

### Cite this

*35th International Conference on Machine Learning, ICML 2018*(Vol. 1, pp. 469-494). International Machine Learning Society (IMLS).

}

*35th International Conference on Machine Learning, ICML 2018.*vol. 1, International Machine Learning Society (IMLS), pp. 469-494, 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 7/10/18.

**Clustering semi-random mixtures of Gaussians.** / Awasthi, Pranjal; Vijayaraghavan, Aravindan.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Clustering semi-random mixtures of Gaussians

AU - Awasthi, Pranjal

AU - Vijayaraghavan, Aravindan

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Gaussian mixture models (GMM) are the most widely used statistical model for the fc-means clustering problem and form a popular framework for clustering in machinc learning and data analysis. In this paper, we propose a natural robust model for fc-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd's algorithm for fc-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any A:-means clustering algorithm on the semi-random model.

AB - Gaussian mixture models (GMM) are the most widely used statistical model for the fc-means clustering problem and form a popular framework for clustering in machinc learning and data analysis. In this paper, we propose a natural robust model for fc-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd's algorithm for fc-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any A:-means clustering algorithm on the semi-random model.

UR - http://www.scopus.com/inward/record.url?scp=85057246073&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057246073&partnerID=8YFLogxK

M3 - Conference contribution

VL - 1

SP - 469

EP - 494

BT - 35th International Conference on Machine Learning, ICML 2018

A2 - Krause, Andreas

A2 - Dy, Jennifer

PB - International Machine Learning Society (IMLS)

ER -