Understanding the role of prosody in encoding linguistic meaning and in shaping phonetic form requires the analysis of prosodically annotated speech drawn from a wide variety of speech materials. Yet obtaining accurate and reliable prosodic annotations for even small datasets is challenging due to the time and expertise required. We discuss several factors that make prosodic annotation diffcult and impact its reliability, all of which relate to variability: in the patterning of prosodic elements (features and structures) as they relate to the linguistic and discourse context, in the acoustic cues for those prosodic elements, and in the parameter values of the cues. We propose two novel methods for prosodic transcription that capture variability as a source of information relevant to the linguistic analysis of prosody. The frst is Rapid Prosody Transcription (RPT), which can be performed by non-experts using a simple set of unary labels to mark prominence and boundaries based on immediate auditory impression. Inter-transcriber variability is used to calculate continuous-valued prosody 'scores' that are assigned to each word and represent the perceptual salience of its prosodic features or structure. RPT can be used to model the relative in?uence of top-down factors and acoustic cues in prosody perception, and to model prosodic variation across many dimensions, including language variety, speech style, or speaker's affect. The second proposed method is the identifcation of individual cues to the contrastive prosodic elements of an utterance. Cue specifcation provides a link between the contrastive symbolic categories of prosodic structures and the continuous-valued parameters in the acoustic signal, and offers a framework for investigating how factors related to the grammatical and situational context in?uence the phonetic form of spoken words and phrases. While cue specifcation as a transcription tool has not yet been explored as RPT has, it has the potential to provide a level of detail that will be useful in modelling systematic context-governed variation in the implementation of prosodic categories, with applications in automatic speech synthesis and recognition, as well as modelling human speech production and perception. We discuss how RPT and cue specifcation, particularly when combined, can improve the effciency and reliability of prosodic transcription and how they can be integrated with expert phonological transcription.
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language
- Computer Science Applications