Modeling the perception of frequency-shifted vowels

Peter F. Assmann, Terrance M. Nearey, Jack M. Scott

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Scopus citations

Abstract

A significant fact about speech perception is that intelligibility is preserved when the spectrum is shifted up or down along the frequency scale, across a fairly wide range. To study the relationship between fundamental frequency (F0) and spectrum envelope shifts in vowel perception, we used a high-quality vocoder (STRAIGHT) to process a set of vowels spoken by 3 adult males in/hVd/context. Identification accuracy dropped by about 30% when the spectrum envelope was scaled upwards by a factor of 2.0, and in a separate condition, by about 50% when F0 was raised by 2 octaves. However, when spectrum envelope and F0 were both increased at the same time, identification accuracy showed a marked improvement, compared to conditions where each cue was manipulated separately. The synergy between formant frequency and F0 was predicted by a model which accounts for the intelligibility of frequency-shifted vowels in terms of learned relationships between measured values of F0 and formant frequencies. A second model, based on auditory excitation patterns, predicted the main effects of F0 and spectrum envelope, but did not predict the pattern of interaction.

Original languageEnglish (US)
Title of host publication7th International Conference on Spoken Language Processing, ICSLP 2002
PublisherInternational Speech Communication Association
Pages425-428
Number of pages4
StatePublished - Jan 1 2002
Event7th International Conference on Spoken Language Processing, ICSLP 2002 - Denver, United States
Duration: Sep 16 2002Sep 20 2002

Other

Other7th International Conference on Spoken Language Processing, ICSLP 2002
Country/TerritoryUnited States
CityDenver
Period9/16/029/20/02

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Modeling the perception of frequency-shifted vowels'. Together they form a unique fingerprint.

Cite this