TY - GEN
T1 - Turkers, Scholars, "arafat" and "peace"
T2 - 18th ACM International Conference on Computer-Supported Cooperative Work and Social Computing, CSCW 2015
AU - Sen, Shilad
AU - Giesel, Margaret E.
AU - Gold, Rebecca
AU - Hillmann, Benjamin
AU - Lesicko, Matt
AU - Naden, Samuel
AU - Russell, Jesse
AU - Wang, Zixiao Ken
AU - Hecht, Brent
N1 - Funding Information:
This research has been generously supported by Macalester College and the National Science Foundation (grants IIS-0964697 and IIS-0808692).
Publisher Copyright:
© 2015 ACM.
PY - 2015/2/28
Y1 - 2015/2/28
N2 - In just a few years, crowdsourcing markets like Mechanical Turk have become the dominant mechanism for for building "gold standard" datasets in areas of computer science ranging from natural language processing to audio transcription. The assumption behind this sea change-An assumption that is central to the approaches taken in hundreds of research projects-is that crowdsourced markets can accurately replicate the judgments of the general population for knowledgeoriented tasks. Focusing on the important domain of semantic relatedness algorithms and leveraging Clark's theory of common ground as a framework, we demonstrate that this assumption can be highly problematic. Using 7,921 semantic relatedness judgements from 72 scholars and 39 crowdworkers, we show that crowdworkers on Mechanical Turk produce significantly different semantic relatedness gold standard judgements than people from other communities. We also show that algorithms that perform well against Mechanical Turk gold standard datasets do significantly worse when evaluated against other communities' gold standards. Our results call into question the broad use of Mechanical Turk for the development of gold standard datasets and demonstrate the importance of understanding these datasets from a human-centered point-of-view. More generally, our findings problematize the notion that a universal gold standard dataset exists for all knowledge tasks.
AB - In just a few years, crowdsourcing markets like Mechanical Turk have become the dominant mechanism for for building "gold standard" datasets in areas of computer science ranging from natural language processing to audio transcription. The assumption behind this sea change-An assumption that is central to the approaches taken in hundreds of research projects-is that crowdsourced markets can accurately replicate the judgments of the general population for knowledgeoriented tasks. Focusing on the important domain of semantic relatedness algorithms and leveraging Clark's theory of common ground as a framework, we demonstrate that this assumption can be highly problematic. Using 7,921 semantic relatedness judgements from 72 scholars and 39 crowdworkers, we show that crowdworkers on Mechanical Turk produce significantly different semantic relatedness gold standard judgements than people from other communities. We also show that algorithms that perform well against Mechanical Turk gold standard datasets do significantly worse when evaluated against other communities' gold standards. Our results call into question the broad use of Mechanical Turk for the development of gold standard datasets and demonstrate the importance of understanding these datasets from a human-centered point-of-view. More generally, our findings problematize the notion that a universal gold standard dataset exists for all knowledge tasks.
KW - Amazon Mechanical Turk
KW - cultural communities
KW - gold standard datasets
KW - natural language processing
KW - semantic relatedness
KW - user studies
UR - http://www.scopus.com/inward/record.url?scp=84957913256&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84957913256&partnerID=8YFLogxK
U2 - 10.1145/2675133.2675285
DO - 10.1145/2675133.2675285
M3 - Conference contribution
AN - SCOPUS:84957913256
T3 - CSCW 2015 - Proceedings of the 2015 ACM International Conference on Computer-Supported Cooperative Work and Social Computing
SP - 826
EP - 838
BT - CSCW 2015 - Proceedings of the 2015 ACM International Conference on Computer-Supported Cooperative Work and Social Computing
PB - Association for Computing Machinery, Inc
Y2 - 14 March 2015 through 18 March 2015
ER -