Using crowdsourcing to improve profanity detection

Sara Owsley Sood*, Judd Antin, Elizabeth F. Churchill

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

24 Citations (Scopus)

Abstract

Profanity detection is often thought to be an easy task. However, past work has shown that current, list-based systems are performing poorly. They fail to adapt to evolving profane slang, identify profane terms that have been disguised or only partially censored (e.g., @ss, f$#%) or intentionally or unintentionally misspelled (e.g., biatch, shiiiit). For these reasons, they are easy to circumvent and have very poor recall. Secondly, they are a one-size fits all solution - making assumptions that the definition, use and perceptions of profane or inappropriate holds across all contexts. In this article, we present work that attempts to move beyond list-based profanity detection systems by identifying the context in which profanity occurs. The proposed system uses a set of comments from a social news site labeled by Amazon Mechanical Turk workers for the presence of profanity. This system far surpasses the performance of listbased profanity detection techniques. The use of crowd-sourcing in this task suggests an opportunity to build profanity detection systems tailored to sites and communities.

Original languageEnglish (US)
Title of host publicationWisdom of the Crowd - Papers from the AAAI Spring Symposium
Pages69-74
Number of pages6
StatePublished - Aug 20 2012
Event2012 AAAI Spring Symposium - Stanford, CA, United States
Duration: Mar 26 2012Mar 28 2012

Publication series

NameAAAI Spring Symposium - Technical Report
VolumeSS-12-06

Other

Other2012 AAAI Spring Symposium
CountryUnited States
CityStanford, CA
Period3/26/123/28/12

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Sood, S. O., Antin, J., & Churchill, E. F. (2012). Using crowdsourcing to improve profanity detection. In Wisdom of the Crowd - Papers from the AAAI Spring Symposium (pp. 69-74). (AAAI Spring Symposium - Technical Report; Vol. SS-12-06).
Sood, Sara Owsley ; Antin, Judd ; Churchill, Elizabeth F. / Using crowdsourcing to improve profanity detection. Wisdom of the Crowd - Papers from the AAAI Spring Symposium. 2012. pp. 69-74 (AAAI Spring Symposium - Technical Report).
@inproceedings{be35be7687ff4e63a4f4ee1e56aac4d6,
title = "Using crowdsourcing to improve profanity detection",
abstract = "Profanity detection is often thought to be an easy task. However, past work has shown that current, list-based systems are performing poorly. They fail to adapt to evolving profane slang, identify profane terms that have been disguised or only partially censored (e.g., @ss, f$#{\%}) or intentionally or unintentionally misspelled (e.g., biatch, shiiiit). For these reasons, they are easy to circumvent and have very poor recall. Secondly, they are a one-size fits all solution - making assumptions that the definition, use and perceptions of profane or inappropriate holds across all contexts. In this article, we present work that attempts to move beyond list-based profanity detection systems by identifying the context in which profanity occurs. The proposed system uses a set of comments from a social news site labeled by Amazon Mechanical Turk workers for the presence of profanity. This system far surpasses the performance of listbased profanity detection techniques. The use of crowd-sourcing in this task suggests an opportunity to build profanity detection systems tailored to sites and communities.",
author = "Sood, {Sara Owsley} and Judd Antin and Churchill, {Elizabeth F.}",
year = "2012",
month = "8",
day = "20",
language = "English (US)",
isbn = "9781577355557",
series = "AAAI Spring Symposium - Technical Report",
pages = "69--74",
booktitle = "Wisdom of the Crowd - Papers from the AAAI Spring Symposium",

}

Sood, SO, Antin, J & Churchill, EF 2012, Using crowdsourcing to improve profanity detection. in Wisdom of the Crowd - Papers from the AAAI Spring Symposium. AAAI Spring Symposium - Technical Report, vol. SS-12-06, pp. 69-74, 2012 AAAI Spring Symposium, Stanford, CA, United States, 3/26/12.

Using crowdsourcing to improve profanity detection. / Sood, Sara Owsley; Antin, Judd; Churchill, Elizabeth F.

Wisdom of the Crowd - Papers from the AAAI Spring Symposium. 2012. p. 69-74 (AAAI Spring Symposium - Technical Report; Vol. SS-12-06).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Using crowdsourcing to improve profanity detection

AU - Sood, Sara Owsley

AU - Antin, Judd

AU - Churchill, Elizabeth F.

PY - 2012/8/20

Y1 - 2012/8/20

N2 - Profanity detection is often thought to be an easy task. However, past work has shown that current, list-based systems are performing poorly. They fail to adapt to evolving profane slang, identify profane terms that have been disguised or only partially censored (e.g., @ss, f$#%) or intentionally or unintentionally misspelled (e.g., biatch, shiiiit). For these reasons, they are easy to circumvent and have very poor recall. Secondly, they are a one-size fits all solution - making assumptions that the definition, use and perceptions of profane or inappropriate holds across all contexts. In this article, we present work that attempts to move beyond list-based profanity detection systems by identifying the context in which profanity occurs. The proposed system uses a set of comments from a social news site labeled by Amazon Mechanical Turk workers for the presence of profanity. This system far surpasses the performance of listbased profanity detection techniques. The use of crowd-sourcing in this task suggests an opportunity to build profanity detection systems tailored to sites and communities.

AB - Profanity detection is often thought to be an easy task. However, past work has shown that current, list-based systems are performing poorly. They fail to adapt to evolving profane slang, identify profane terms that have been disguised or only partially censored (e.g., @ss, f$#%) or intentionally or unintentionally misspelled (e.g., biatch, shiiiit). For these reasons, they are easy to circumvent and have very poor recall. Secondly, they are a one-size fits all solution - making assumptions that the definition, use and perceptions of profane or inappropriate holds across all contexts. In this article, we present work that attempts to move beyond list-based profanity detection systems by identifying the context in which profanity occurs. The proposed system uses a set of comments from a social news site labeled by Amazon Mechanical Turk workers for the presence of profanity. This system far surpasses the performance of listbased profanity detection techniques. The use of crowd-sourcing in this task suggests an opportunity to build profanity detection systems tailored to sites and communities.

UR - http://www.scopus.com/inward/record.url?scp=84865029152&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84865029152&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84865029152

SN - 9781577355557

T3 - AAAI Spring Symposium - Technical Report

SP - 69

EP - 74

BT - Wisdom of the Crowd - Papers from the AAAI Spring Symposium

ER -

Sood SO, Antin J, Churchill EF. Using crowdsourcing to improve profanity detection. In Wisdom of the Crowd - Papers from the AAAI Spring Symposium. 2012. p. 69-74. (AAAI Spring Symposium - Technical Report).