Clustering semi-random mixtures of Gaussians

Pranjal Awasthi*, Aravindan Vijayaraghavan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Gaussian mixture models (GMM) are the most widely used statistical model for the fc-means clustering problem and form a popular framework for clustering in machinc learning and data analysis. In this paper, we propose a natural robust model for fc-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd's algorithm for fc-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any A:-means clustering algorithm on the semi-random model.

Original languageEnglish (US)
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsAndreas Krause, Jennifer Dy
PublisherInternational Machine Learning Society (IMLS)
Pages469-494
Number of pages26
Volume1
ISBN (Electronic)9781510867963
StatePublished - Jan 1 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Other

Other35th International Conference on Machine Learning, ICML 2018
CountrySweden
CityStockholm
Period7/10/187/15/18

Fingerprint

Clustering algorithms
Polynomials
Statistical Models

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Cite this

Awasthi, P., & Vijayaraghavan, A. (2018). Clustering semi-random mixtures of Gaussians. In A. Krause, & J. Dy (Eds.), 35th International Conference on Machine Learning, ICML 2018 (Vol. 1, pp. 469-494). International Machine Learning Society (IMLS).
Awasthi, Pranjal ; Vijayaraghavan, Aravindan. / Clustering semi-random mixtures of Gaussians. 35th International Conference on Machine Learning, ICML 2018. editor / Andreas Krause ; Jennifer Dy. Vol. 1 International Machine Learning Society (IMLS), 2018. pp. 469-494
@inproceedings{ada4e26b98cc40eaa1382291b90589f0,
title = "Clustering semi-random mixtures of Gaussians",
abstract = "Gaussian mixture models (GMM) are the most widely used statistical model for the fc-means clustering problem and form a popular framework for clustering in machinc learning and data analysis. In this paper, we propose a natural robust model for fc-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd's algorithm for fc-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any A:-means clustering algorithm on the semi-random model.",
author = "Pranjal Awasthi and Aravindan Vijayaraghavan",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
volume = "1",
pages = "469--494",
editor = "Andreas Krause and Jennifer Dy",
booktitle = "35th International Conference on Machine Learning, ICML 2018",
publisher = "International Machine Learning Society (IMLS)",

}

Awasthi, P & Vijayaraghavan, A 2018, Clustering semi-random mixtures of Gaussians. in A Krause & J Dy (eds), 35th International Conference on Machine Learning, ICML 2018. vol. 1, International Machine Learning Society (IMLS), pp. 469-494, 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 7/10/18.

Clustering semi-random mixtures of Gaussians. / Awasthi, Pranjal; Vijayaraghavan, Aravindan.

35th International Conference on Machine Learning, ICML 2018. ed. / Andreas Krause; Jennifer Dy. Vol. 1 International Machine Learning Society (IMLS), 2018. p. 469-494.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Clustering semi-random mixtures of Gaussians

AU - Awasthi, Pranjal

AU - Vijayaraghavan, Aravindan

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Gaussian mixture models (GMM) are the most widely used statistical model for the fc-means clustering problem and form a popular framework for clustering in machinc learning and data analysis. In this paper, we propose a natural robust model for fc-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd's algorithm for fc-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any A:-means clustering algorithm on the semi-random model.

AB - Gaussian mixture models (GMM) are the most widely used statistical model for the fc-means clustering problem and form a popular framework for clustering in machinc learning and data analysis. In this paper, we propose a natural robust model for fc-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd's algorithm for fc-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any A:-means clustering algorithm on the semi-random model.

UR - http://www.scopus.com/inward/record.url?scp=85057246073&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057246073&partnerID=8YFLogxK

M3 - Conference contribution

VL - 1

SP - 469

EP - 494

BT - 35th International Conference on Machine Learning, ICML 2018

A2 - Krause, Andreas

A2 - Dy, Jennifer

PB - International Machine Learning Society (IMLS)

ER -

Awasthi P, Vijayaraghavan A. Clustering semi-random mixtures of Gaussians. In Krause A, Dy J, editors, 35th International Conference on Machine Learning, ICML 2018. Vol. 1. International Machine Learning Society (IMLS). 2018. p. 469-494