An Evaluation of Tweet Sentiment Classification Methods

Lihua Yao, Jerry Li, Hassan Alam, Oleg Melnokov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

In this paper, we present the result of our research in predicting sentiment from Twitter data derived from a Kaggle competition. Our goal was to determine the efficacy of different supervised classification methods to predict Twitter sentiment to be Positive, Neutral or Negative. We evaluated four different classification statistical models: 1. Logistic Regression (LR), 2. Linear Support Vector Machine (LSV), 3. Multinomial Naïve Bayesian (NB), and 4. Random Forest (RF). We also evaluated two different tokenization methods 1. Document Term Matrix (DTM) and 2. Term Frequency-Inverse Document Frequency (TF-IDF). We combined this with three extraction methods 1. Original Tweet Text, 2. Rapid Automatic Keyword Extraction (RAKE) and 3. Hand curated Selected Text. Furthermore, various Neural Networks were applied to the Tweet Text and BERT extracted data that reduced the original 1000 features to be 768 that were applied to different models. Our experiment shows RF and LR gives the best results and there is little difference between DTM and TF-IDF. Fully connected neural network (FCNN) performed the best for the BERT extracted data with a test score of 0.75.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 International Conference on Computational Science and Computational Intelligence, CSCI 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages298-303
Number of pages6
ISBN (Electronic)9781728176246
DOIs
StatePublished - Dec 2020
Event2020 International Conference on Computational Science and Computational Intelligence, CSCI 2020 - Las Vegas, United States
Duration: Dec 16 2020Dec 18 2020

Publication series

NameProceedings - 2020 International Conference on Computational Science and Computational Intelligence, CSCI 2020

Conference

Conference2020 International Conference on Computational Science and Computational Intelligence, CSCI 2020
Country/TerritoryUnited States
CityLas Vegas
Period12/16/2012/18/20

Keywords

  • Neural Network
  • NLP
  • Processing
  • Statistical Models
  • TF-IDF
  • Twitter

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'An Evaluation of Tweet Sentiment Classification Methods'. Together they form a unique fingerprint.

Cite this