Abstract
In this paper, we propose a natural notion of individual preference (IP) stability for clustering, which asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. Our notion can be motivated from several perspectives, including game theory and algorithmic fairness. We study several questions related to our proposed notion. We first show that deciding whether a given data set allows for an IP-stable clustering in general is NP-hard. As a result, we explore the design of efficient algorithms for finding IP-stable clusterings in some restricted metric spaces. We present a polytime algorithm to find a clustering satisfying exact IP-stability on the real line, and an efficient algorithm to find an IP-stable 2-clustering for a tree metric. We also consider relaxing the stability constraint, i.e., every data point should not be too far from its own cluster compared to any other cluster. For this case, we provide polytime algorithms with different guarantees. We evaluate some of our algorithms and several standard clustering approaches on real data sets.
Original language | English (US) |
---|---|
Pages (from-to) | 197-246 |
Number of pages | 50 |
Journal | Proceedings of Machine Learning Research |
Volume | 162 |
State | Published - 2022 |
Event | 39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States Duration: Jul 17 2022 → Jul 23 2022 |
Funding
SA is supported by the Simons Collaborative grant on Theory of Algorithmic Fairness, and the National Science Foundation grant CCF-1733556. JM is supported by fundings from the NSF AI Institute for the Foundations of Machine Learning (IFML), an NSF Career award, and the Simons Collaborative grant on Theory of Algorithmic Fairness. AV is supported by NSF award CCF-1934843.
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability