TY - GEN
T1 - SES
T2 - 11th IEEE International Conference on Data Mining Workshops, ICDMW 2011
AU - Zhang, Kunpeng
AU - Cheng, Yu
AU - Xie, Yusheng
AU - Honbo, Daniel
AU - Agrawal, Ankit
AU - Palsetia, Diana
AU - Lee, Kathy
AU - Liao, Wei Keng
AU - Choudhary, Alok
PY - 2011
Y1 - 2011
N2 - Social Media is becoming major and popular technological platform that allows users discussing and sharing information. Information is generated and managed through either computer or mobile devices by one person and consumed by many other persons. Most of these user generated content are textual information, as Social Networks(Facebook, LinkedIn), Microblogging(Twitter), blogs(Blogspot, Wordpress). Looking for valuable nuggets of knowledge, such as capturing and summarizing sentiments from these huge amount of data could help users make informed decisions. In this paper, we develop a sentiment identification system called SES which implements three different sentiment identification algorithms. We augment basic compositional semantic rules in the first algorithm. In the second algorithm, we think sentiment should not be simply classified as positive, negative, and objective but a continuous score to reflect sentiment degree. All word scores are calculated based on a large volume of customer reviews. Due to the special characteristics of social media texts, we propose a third algorithm which takes emoticons, negation word position, and domain-specific words into account. Furthermore, a machine learning model is employed on features derived from outputs of three algorithms. We conduct our experiments on user comments from Facebook and tweets from twitter. The results show that utilizing Random Forest will acquire a better accuracy than decision tree, neural network, and logistic regression. We also propose a flexible way to represent document sentiment based on sentiments of each sentence contained. SES is available online.
AB - Social Media is becoming major and popular technological platform that allows users discussing and sharing information. Information is generated and managed through either computer or mobile devices by one person and consumed by many other persons. Most of these user generated content are textual information, as Social Networks(Facebook, LinkedIn), Microblogging(Twitter), blogs(Blogspot, Wordpress). Looking for valuable nuggets of knowledge, such as capturing and summarizing sentiments from these huge amount of data could help users make informed decisions. In this paper, we develop a sentiment identification system called SES which implements three different sentiment identification algorithms. We augment basic compositional semantic rules in the first algorithm. In the second algorithm, we think sentiment should not be simply classified as positive, negative, and objective but a continuous score to reflect sentiment degree. All word scores are calculated based on a large volume of customer reviews. Due to the special characteristics of social media texts, we propose a third algorithm which takes emoticons, negation word position, and domain-specific words into account. Furthermore, a machine learning model is employed on features derived from outputs of three algorithms. We conduct our experiments on user comments from Facebook and tweets from twitter. The results show that utilizing Random Forest will acquire a better accuracy than decision tree, neural network, and logistic regression. We also propose a flexible way to represent document sentiment based on sentiments of each sentence contained. SES is available online.
KW - Machine learning
KW - Rule
KW - Sentiment
KW - Social media
UR - http://www.scopus.com/inward/record.url?scp=84863163463&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863163463&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2011.153
DO - 10.1109/ICDMW.2011.153
M3 - Conference contribution
AN - SCOPUS:84863163463
SN - 9780769544090
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 129
EP - 136
BT - Proceedings - 11th IEEE International Conference on Data Mining Workshops, ICDMW 2011
Y2 - 11 December 2011 through 11 December 2011
ER -