TY - GEN
T1 - On the cost of essentially fair clusterings
AU - Bercea, Ioana O.
AU - Khuller, Samir
AU - Rösner, Clemens
AU - Schmidt, Melanie
AU - Groß, Martin
AU - Kumar, Aounon
AU - Schmidt, Daniel R.
N1 - Funding Information:
The authors would like to thank Sorelle Friedler for useful discussions related to the topic of fairness. The first author?s work was done while a Ph.D. student at the University of Maryland.
Publisher Copyright:
© Ioana O. Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, Daniel R. Schmidt, and Melanie Schmidt.
PY - 2019/9
Y1 - 2019/9
N2 - Clustering is a fundamental tool in data mining and machine learning. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters – especially if the data is already biased. At NIPS 2017, Chierichetti et al. [18] proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair k-center problem and a O(t)-approximation for the fair k-median problem, where t is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair k-center [40]. We extend and improve the known results. Firstly, we give a 5-approximation for the fair k-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives k-center, k-supplier, k-median, k-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.
AB - Clustering is a fundamental tool in data mining and machine learning. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters – especially if the data is already biased. At NIPS 2017, Chierichetti et al. [18] proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair k-center problem and a O(t)-approximation for the fair k-median problem, where t is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair k-center [40]. We extend and improve the known results. Firstly, we give a 5-approximation for the fair k-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives k-center, k-supplier, k-median, k-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.
KW - Approximation
KW - Clustering
KW - Fairness
KW - LP rounding
UR - http://www.scopus.com/inward/record.url?scp=85072851398&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072851398&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.APPROX-RANDOM.2019.18
DO - 10.4230/LIPIcs.APPROX-RANDOM.2019.18
M3 - Conference contribution
AN - SCOPUS:85072851398
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2019
A2 - Achlioptas, Dimitris
A2 - Vegh, Laszlo A.
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 22nd International Conference on Approximation Algorithms for Combinatorial Optimization Problems and 23rd International Conference on Randomization and Computation, APPROX/RANDOM 2019
Y2 - 20 September 2019 through 22 September 2019
ER -