Soft-Label Integration for Robust Toxicity Classification

Zelei Cheng, Xian Wu, Jiahao Yu, Shuo Han, Xin Qiang Cai, Xinyu Xing

Research output: Contribution to journalConference articlepeer-review

Abstract

Toxicity classification in textual content remains a significant problem. Data with labels from a single annotator fall short of capturing the diversity of human perspectives. Therefore, there is a growing need to incorporate crowdsourced annotations for training an effective toxicity classifier. Additionally, the standard approach to training a classifier using empirical risk minimization (ERM) may fail to address the potential shifts between the training set and testing set due to exploiting spurious correlations. This work introduces a novel bi-level optimization framework that integrates crowdsourced annotations with the soft-labeling technique and optimizes the soft-label weights by Group Distributionally Robust Optimization (GroupDRO) to enhance the robustness against out-of-distribution (OOD) risk. We theoretically prove the convergence of our bi-level optimization algorithm. Experimental results demonstrate that our approach outperforms existing baseline methods in terms of both average and worst-group accuracy, confirming its effectiveness in leveraging crowdsourced annotations to achieve more effective and robust toxicity classification.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
Volume37
StatePublished - 2024
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: Dec 9 2024Dec 15 2024

Funding

This project was supported in part by NSF Grants 2225234 and 2225225. The third-party dataset was provided by Sec3 Inc. and we release the dataset under the agreement of Sec3 Inc.

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Soft-Label Integration for Robust Toxicity Classification'. Together they form a unique fingerprint.

Cite this