Predicting the demographics of Twitter users from website traffic data

Aron Culotta, Nirmal Kumar Ravi, Jennifer Cutler

Research output: Chapter in Book/Report/Conference proceedingConference contribution

103 Scopus citations

Abstract

Understanding the demographics of users of online social networks has important applications for health, marketing, and public messaging. In this paper, we predict the demographics of Twitter users based on whom they follow. Whereas most prior approaches rely on a supervised learning approach, in which individual users are labeled with demographics, we instead create a distantly labeled dataset by collecting audience measurement data for 1,500 websites (e.g., 50% of visitors to gizmodo.com are estimated to have a bachelor's degree). We then fit a regression model to predict these demographics using information about the followers of each website on Twitter. The resulting average held-out correlation is .77 across six different variables (gender, age, ethnicity, education, income, and child status). We additionally validate the model on a smaller set of Twitter users labeled individually for ethnicity and gender, finding performance that is surprisingly competitive with a fully supervised approach.

Original languageEnglish (US)
Title of host publicationProceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015
PublisherAI Access Foundation
Pages72-78
Number of pages7
ISBN (Electronic)9781577356998
StatePublished - Jun 1 2015
Event29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015 - Austin, United States
Duration: Jan 25 2015Jan 30 2015

Publication series

NameProceedings of the National Conference on Artificial Intelligence
Volume1

Other

Other29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015
Country/TerritoryUnited States
CityAustin
Period1/25/151/30/15

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Predicting the demographics of Twitter users from website traffic data'. Together they form a unique fingerprint.

Cite this