Abstract
This study addresses a binary classification task to determine whether a text sequence, either a sentence or paragraph, is subjective or objective. The task spans five languages-Arabic, Bulgarian, English, German, and Italian-along with a multilingual category. Our approach involved several key techniques. Initially, we preprocessed the data through parts of speech (POS) tagging, identification of question marks, and application of attention masks. We fine-tuned the sentiment-based Transformer model 'MarieAngeA13/Sentiment-AnalysisBERT' on our dataset. Given the imbalance with more objective data, we implemented a custom classifier that assigned greater weight to objective data. Additionally, we translated non-English data into English to maintain consistency across the dataset. Our model achieved notable results, scoring top marks for the multilingual dataset (Macro F1-0.7121) and German (Macro F1-0.7908). It ranked second for Arabic (Macro F1-0.4908) and Bulgarian (Macro F1-0.7169), third for Italian (Macro F1-0.7430), and ninth for English (Macro F1-0.6893).
Original language | English (US) |
---|---|
Pages (from-to) | 361-368 |
Number of pages | 8 |
Journal | CEUR Workshop Proceedings |
Volume | 3740 |
State | Published - 2024 |
Event | 25th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2024 - Grenoble, France Duration: Sep 9 2024 → Sep 12 2024 |
Funding
We acknowledge Qatar National Research Fund grant NPRP14C0916-210015 from the Qatar Research Development and Innovation Council (QRDI) for funding this research.
Keywords
- fact checking
- natural language processing
- news articles
- sentiment
- subjectivity
- text sequence
ASJC Scopus subject areas
- General Computer Science