On the bias and inconsistency of K-means clustering

Research output: Book/ReportOther report

Abstract

We provide a counterexample showing that the K-means clustering algorithm using hard assignments produces biased and inconsistent estimates of the cluster means and variances. We discuss how a Gaussian mixture model that assumes spherical clusters with equal shape and size, and makes soft assignments to clusters produces consistent estimates from good starting values, and has computational complexity comparable to K-means. We recommend that the Gaussian mixture model be used instead of K-means.
Original languageEnglish (US)
StatePublished - 2016

Cite this