Using crowdsourcing to improve profanity detection

Sara Owsley Sood*, Judd Antin, Elizabeth F. Churchill

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

58 Scopus citations

Abstract

Profanity detection is often thought to be an easy task. However, past work has shown that current, list-based systems are performing poorly. They fail to adapt to evolving profane slang, identify profane terms that have been disguised or only partially censored (e.g., @ss, f$#%) or intentionally or unintentionally misspelled (e.g., biatch, shiiiit). For these reasons, they are easy to circumvent and have very poor recall. Secondly, they are a one-size fits all solution - making assumptions that the definition, use and perceptions of profane or inappropriate holds across all contexts. In this article, we present work that attempts to move beyond list-based profanity detection systems by identifying the context in which profanity occurs. The proposed system uses a set of comments from a social news site labeled by Amazon Mechanical Turk workers for the presence of profanity. This system far surpasses the performance of listbased profanity detection techniques. The use of crowd-sourcing in this task suggests an opportunity to build profanity detection systems tailored to sites and communities.

Original languageEnglish (US)
Title of host publicationWisdom of the Crowd - Papers from the AAAI Spring Symposium
Pages69-74
Number of pages6
StatePublished - Aug 20 2012
Event2012 AAAI Spring Symposium - Stanford, CA, United States
Duration: Mar 26 2012Mar 28 2012

Publication series

NameAAAI Spring Symposium - Technical Report
VolumeSS-12-06

Other

Other2012 AAAI Spring Symposium
Country/TerritoryUnited States
CityStanford, CA
Period3/26/123/28/12

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Using crowdsourcing to improve profanity detection'. Together they form a unique fingerprint.

Cite this