TY - GEN
T1 - Twitter trending topic classification
AU - Lee, Kathy
AU - Palsetia, Diana
AU - Narayanan, Ramanathan
AU - Patwary, Md Mostofa Ali
AU - Agrawal, Ankit
AU - Choudhary, Alok
PY - 2011
Y1 - 2011
N2 - With the increasing popularity of microblogging sites, we are in the era of information explosion. As of June 2011, about 200 million tweets are being generated every day. Although Twitter provides a list of most popular topics people tweet about known as Trending Topics in real time, it is often hard to understand what these trending topics are about. Therefore, it is important and necessary to classify these topics into general categories with high accuracy for better information retrieval. To address this problem, we classify Twitter Trending Topics into 18 general categories such as sports, politics, technology, etc. We experiment with 2 approaches for topic classification; (i) the well-known Bag-of-Words approach for text classification and (ii) network-based classification. In text-based classification method, we construct word vectors with trending topic definition and tweets, and the commonly used tf-idf weights are used to classify the topics using a Naive Bayes Multinomial classifier. In network-based classification method, we identify top 5 similar topics for a given topic based on the number of common influential users. The categories of the similar topics and the number of common influential users between the given topic and its similar topics are used to classify the given topic using a C5.0 decision tree learner. Experiments on a database of randomly selected 768 trending topics (over 18 classes) show that classification accuracy of up to 65% and 70% can be achieved using text-based and network-based classification modeling respectively.
AB - With the increasing popularity of microblogging sites, we are in the era of information explosion. As of June 2011, about 200 million tweets are being generated every day. Although Twitter provides a list of most popular topics people tweet about known as Trending Topics in real time, it is often hard to understand what these trending topics are about. Therefore, it is important and necessary to classify these topics into general categories with high accuracy for better information retrieval. To address this problem, we classify Twitter Trending Topics into 18 general categories such as sports, politics, technology, etc. We experiment with 2 approaches for topic classification; (i) the well-known Bag-of-Words approach for text classification and (ii) network-based classification. In text-based classification method, we construct word vectors with trending topic definition and tweets, and the commonly used tf-idf weights are used to classify the topics using a Naive Bayes Multinomial classifier. In network-based classification method, we identify top 5 similar topics for a given topic based on the number of common influential users. The categories of the similar topics and the number of common influential users between the given topic and its similar topics are used to classify the given topic using a C5.0 decision tree learner. Experiments on a database of randomly selected 768 trending topics (over 18 classes) show that classification accuracy of up to 65% and 70% can be achieved using text-based and network-based classification modeling respectively.
KW - Social networks
KW - Topic classification
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=84863157333&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863157333&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2011.171
DO - 10.1109/ICDMW.2011.171
M3 - Conference contribution
AN - SCOPUS:84863157333
SN - 9780769544090
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 251
EP - 258
BT - Proceedings - 11th IEEE International Conference on Data Mining Workshops, ICDMW 2011
T2 - 11th IEEE International Conference on Data Mining Workshops, ICDMW 2011
Y2 - 11 December 2011 through 11 December 2011
ER -