Much research has shown that social media platforms have substantial population biases. However, very little is known about how these population biases affect the many algorithms that rely on social media data. Focusing on the case study of geolocation inference algorithms and their performance across the urban-rural spectrum, we establish that these algorithms exhibit significantly worse performance for underrepresented populations (i.e. rural users). We further establish that this finding is robust across both text- and network-based algorithm designs. However, we also show that some of this bias can be attributed to the design of algorithms themselves rather than population biases in the underlying data sources. For instance, in some cases, algorithms perform badly for rural users even when we substantially overcorrect for population biases by training exclusively on rural data. We discuss the implications of our findings for the design and study of social media-based algorithms.
|Original language||English (US)|
|Title of host publication||CHI '17 Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems|
|Number of pages||13|
|State||Published - 2017|