OBJECTIVES: The ability of human listeners to identify consonants (presented as nonsense syllables) on the basis of primarily temporal information was compared with the predictions of a simple model based on the amplitude modulation spectra of the stimuli calculated for six octave-spaced carrier frequencies (250 to 8000 Hz) and six octave-spaced amplitude modulation frequencies (1 to 32 Hz). DESIGN: The listeners and the model were presented with 16 phonemes each spoken by four different talkers processed so that one, two, four, or eight bands of spectral information remained. The average modulation spectrum of each of the processed phonemes was extracted and similarity across phonemes was calculated by the use of a spectral correlation index (SCI). RESULTS: The similarity of the modulation spectra across phonemes as assessed by the spectral correlation index was a strong predictor of the confusions made by human listeners. CONCLUSIONS: This result suggests that a sparse set of time-averaged patterns of modulation energy can capture a meaningful aspect of the information listeners use to distinguish among speech signals.
ASJC Scopus subject areas
- Speech and Hearing