TY - GEN
T1 - Using crowdsourcing to improve profanity detection
AU - Sood, Sara Owsley
AU - Antin, Judd
AU - Churchill, Elizabeth F.
PY - 2012/8/20
Y1 - 2012/8/20
N2 - Profanity detection is often thought to be an easy task. However, past work has shown that current, list-based systems are performing poorly. They fail to adapt to evolving profane slang, identify profane terms that have been disguised or only partially censored (e.g., @ss, f$#%) or intentionally or unintentionally misspelled (e.g., biatch, shiiiit). For these reasons, they are easy to circumvent and have very poor recall. Secondly, they are a one-size fits all solution - making assumptions that the definition, use and perceptions of profane or inappropriate holds across all contexts. In this article, we present work that attempts to move beyond list-based profanity detection systems by identifying the context in which profanity occurs. The proposed system uses a set of comments from a social news site labeled by Amazon Mechanical Turk workers for the presence of profanity. This system far surpasses the performance of listbased profanity detection techniques. The use of crowd-sourcing in this task suggests an opportunity to build profanity detection systems tailored to sites and communities.
AB - Profanity detection is often thought to be an easy task. However, past work has shown that current, list-based systems are performing poorly. They fail to adapt to evolving profane slang, identify profane terms that have been disguised or only partially censored (e.g., @ss, f$#%) or intentionally or unintentionally misspelled (e.g., biatch, shiiiit). For these reasons, they are easy to circumvent and have very poor recall. Secondly, they are a one-size fits all solution - making assumptions that the definition, use and perceptions of profane or inappropriate holds across all contexts. In this article, we present work that attempts to move beyond list-based profanity detection systems by identifying the context in which profanity occurs. The proposed system uses a set of comments from a social news site labeled by Amazon Mechanical Turk workers for the presence of profanity. This system far surpasses the performance of listbased profanity detection techniques. The use of crowd-sourcing in this task suggests an opportunity to build profanity detection systems tailored to sites and communities.
UR - http://www.scopus.com/inward/record.url?scp=84865029152&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865029152&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84865029152
SN - 9781577355557
T3 - AAAI Spring Symposium - Technical Report
SP - 69
EP - 74
BT - Wisdom of the Crowd - Papers from the AAAI Spring Symposium
T2 - 2012 AAAI Spring Symposium
Y2 - 26 March 2012 through 28 March 2012
ER -