Crowd-sourcing prosodic annotation

Jennifer Cole*, Timothy Mahrt, Joseph Roy

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

17 Scopus citations


Much of what is known about prosody is based on native speaker intuitions of idealized speech, or on prosodic annotations from trained annotators whose auditory impressions are augmented by visual evidence from speech waveforms, spectrograms and pitch tracks. Expanding the prosodic data currently available to cover more languages, and to cover a broader range of unscripted speech styles, is prohibitive due to the time, money and human expertise needed for prosodic annotation. We describe an alternative approach to prosodic data collection, with coarse-grained annotations from a cohort of untrained annotators performing rapid prosody transcription (RPT) using LMEDS, an open-source software tool we developed to enable large-scale, crowd-sourced data collection with RPT. Results from three RPT experiments are reported. The reliability of RPT is analysed comparing kappa statistics for lab-based and crowd-sourced annotations for American English, comparing annotators from the same (US) versus different (Indian) dialect groups, and comparing each RPT annotator with a ToBI annotation. Results show better reliability for same-dialect annotators (US), and the best overall reliability from crowd-sourced US annotators, though lab-based annotations are the most similar to ToBI annotations. A generalized additive mixed model is used to test differences among annotator groups in the factors that predict prosodic annotation. Results show that a common set of acoustic and contextual factors predict prosodic labels for all annotator groups, with only small differences among the RPT groups, but with larger effects on prosodic marking for ToBI annotators. The findings suggest methods for optimizing the efficiency of RPT annotations. Overall, crowd-sourced prosodic annotation is shown to be efficient, and to rely on established cues to prosody, supporting its use for prosody research across languages, dialects, speaker populations, and speech genres.

Original languageEnglish (US)
Pages (from-to)300-325
Number of pages26
JournalComputer Speech and Language
StatePublished - Sep 2017


  • Annotation
  • Crowd-sourcing
  • Generalized mixed effects model
  • Inter-rater reliability
  • Prosody
  • Speech transcription

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Human-Computer Interaction


Dive into the research topics of 'Crowd-sourcing prosodic annotation'. Together they form a unique fingerprint.

Cite this