On the bias and inconsistency of K-means clustering

Research output: Book/ReportOther report

Abstract

We provide a counterexample showing that the K-means clustering algorithm using hard assignments produces biased and inconsistent estimates of the cluster means and variances. We discuss how a Gaussian mixture model that assumes spherical clusters with equal shape and size, and makes soft assignments to clusters produces consistent estimates from good starting values, and has computational complexity comparable to K-means. We recommend that the Gaussian mixture model be used instead of K-means.
Original languageEnglish (US)
StatePublished - 2016

Cite this

@book{1c3feabb1a564020974e6d57e19fa0de,
title = "On the bias and inconsistency of K-means clustering",
abstract = "We provide a counterexample showing that the K-means clustering algorithm using hard assignments produces biased and inconsistent estimates of the cluster means and variances. We discuss how a Gaussian mixture model that assumes spherical clusters with equal shape and size, and makes soft assignments to clusters produces consistent estimates from good starting values, and has computational complexity comparable to K-means. We recommend that the Gaussian mixture model be used instead of K-means.",
author = "Chen Jin and Malthouse, {Edward Carl}",
year = "2016",
language = "English (US)",

}

On the bias and inconsistency of K-means clustering. / Jin, Chen; Malthouse, Edward Carl.

2016.

Research output: Book/ReportOther report

TY - BOOK

T1 - On the bias and inconsistency of K-means clustering

AU - Jin, Chen

AU - Malthouse, Edward Carl

PY - 2016

Y1 - 2016

N2 - We provide a counterexample showing that the K-means clustering algorithm using hard assignments produces biased and inconsistent estimates of the cluster means and variances. We discuss how a Gaussian mixture model that assumes spherical clusters with equal shape and size, and makes soft assignments to clusters produces consistent estimates from good starting values, and has computational complexity comparable to K-means. We recommend that the Gaussian mixture model be used instead of K-means.

AB - We provide a counterexample showing that the K-means clustering algorithm using hard assignments produces biased and inconsistent estimates of the cluster means and variances. We discuss how a Gaussian mixture model that assumes spherical clusters with equal shape and size, and makes soft assignments to clusters produces consistent estimates from good starting values, and has computational complexity comparable to K-means. We recommend that the Gaussian mixture model be used instead of K-means.

M3 - Other report

BT - On the bias and inconsistency of K-means clustering

ER -