TY - JOUR
T1 - Crowd-sourcing prosodic annotation
AU - Cole, Jennifer
AU - Mahrt, Timothy
AU - Roy, Joseph
N1 - Funding Information:
This work was supported by the grants to the first author from the National Science Foundation [BCS 12–51343 and SMA 14–16791]. Thanks to José Hualde, Chris Eager, and Suyeon Im for helpful discussion and for their contributions to data collection and design of RPT tasks used in this study. Thanks also to participants at the 3rd Conference on Experimental and Theoretical Approaches to Prosody for comments and questions on this work.
Publisher Copyright:
© 2017 Elsevier Ltd
PY - 2017/9
Y1 - 2017/9
N2 - Much of what is known about prosody is based on native speaker intuitions of idealized speech, or on prosodic annotations from trained annotators whose auditory impressions are augmented by visual evidence from speech waveforms, spectrograms and pitch tracks. Expanding the prosodic data currently available to cover more languages, and to cover a broader range of unscripted speech styles, is prohibitive due to the time, money and human expertise needed for prosodic annotation. We describe an alternative approach to prosodic data collection, with coarse-grained annotations from a cohort of untrained annotators performing rapid prosody transcription (RPT) using LMEDS, an open-source software tool we developed to enable large-scale, crowd-sourced data collection with RPT. Results from three RPT experiments are reported. The reliability of RPT is analysed comparing kappa statistics for lab-based and crowd-sourced annotations for American English, comparing annotators from the same (US) versus different (Indian) dialect groups, and comparing each RPT annotator with a ToBI annotation. Results show better reliability for same-dialect annotators (US), and the best overall reliability from crowd-sourced US annotators, though lab-based annotations are the most similar to ToBI annotations. A generalized additive mixed model is used to test differences among annotator groups in the factors that predict prosodic annotation. Results show that a common set of acoustic and contextual factors predict prosodic labels for all annotator groups, with only small differences among the RPT groups, but with larger effects on prosodic marking for ToBI annotators. The findings suggest methods for optimizing the efficiency of RPT annotations. Overall, crowd-sourced prosodic annotation is shown to be efficient, and to rely on established cues to prosody, supporting its use for prosody research across languages, dialects, speaker populations, and speech genres.
AB - Much of what is known about prosody is based on native speaker intuitions of idealized speech, or on prosodic annotations from trained annotators whose auditory impressions are augmented by visual evidence from speech waveforms, spectrograms and pitch tracks. Expanding the prosodic data currently available to cover more languages, and to cover a broader range of unscripted speech styles, is prohibitive due to the time, money and human expertise needed for prosodic annotation. We describe an alternative approach to prosodic data collection, with coarse-grained annotations from a cohort of untrained annotators performing rapid prosody transcription (RPT) using LMEDS, an open-source software tool we developed to enable large-scale, crowd-sourced data collection with RPT. Results from three RPT experiments are reported. The reliability of RPT is analysed comparing kappa statistics for lab-based and crowd-sourced annotations for American English, comparing annotators from the same (US) versus different (Indian) dialect groups, and comparing each RPT annotator with a ToBI annotation. Results show better reliability for same-dialect annotators (US), and the best overall reliability from crowd-sourced US annotators, though lab-based annotations are the most similar to ToBI annotations. A generalized additive mixed model is used to test differences among annotator groups in the factors that predict prosodic annotation. Results show that a common set of acoustic and contextual factors predict prosodic labels for all annotator groups, with only small differences among the RPT groups, but with larger effects on prosodic marking for ToBI annotators. The findings suggest methods for optimizing the efficiency of RPT annotations. Overall, crowd-sourced prosodic annotation is shown to be efficient, and to rely on established cues to prosody, supporting its use for prosody research across languages, dialects, speaker populations, and speech genres.
KW - Annotation
KW - Crowd-sourcing
KW - Generalized mixed effects model
KW - Inter-rater reliability
KW - Prosody
KW - Speech transcription
UR - http://www.scopus.com/inward/record.url?scp=85017347058&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85017347058&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2017.02.008
DO - 10.1016/j.csl.2017.02.008
M3 - Article
AN - SCOPUS:85017347058
SN - 0885-2308
VL - 45
SP - 300
EP - 325
JO - Computer Speech and Language
JF - Computer Speech and Language
ER -