Streaming algorithms for k-center clustering with outliers and with anonymity

Richard Matthew McCutchen, Samir Khuller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

56 Scopus citations

Abstract

Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming algorithms achieving a constant-factor approximation to the cluster radius for two variations of the k-center clustering problem. We give a streaming (4 + ε)-approximation algorithm using O(ε-1 kz) memory for the problem with outliers, in which the clustering is allowed to drop up to z of the input points; previous work used a random sampling approach which yields only a bicriteria approximation. We also give a streaming (6 + ε)-approximation algorithm using O(ε-1 ln (ε-1) k + k 2) memory for a variation motivated by anonymity considerations in which each cluster must contain at least a certain number of input points.

Original languageEnglish (US)
Title of host publicationApproximation, Randomization and Combinatorial Optimization
Subtitle of host publicationAlgorithms and Techniques - 11th International Workshop, APPROX 2008 and 12th International Workshop, RANDOM 2008, Proceedings
Pages165-178
Number of pages14
DOIs
StatePublished - 2008
Event11th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2008 and 12th International Workshop on Randomization and Computation, RANDOM 2008 - Boston, MA, United States
Duration: Aug 25 2008Aug 27 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5171 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2008 and 12th International Workshop on Randomization and Computation, RANDOM 2008
Country/TerritoryUnited States
CityBoston, MA
Period8/25/088/27/08

Keywords

  • Anonymity
  • Clustering
  • Outliers
  • Streaming
  • k-center

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Streaming algorithms for k-center clustering with outliers and with anonymity'. Together they form a unique fingerprint.

Cite this