Automatic recognition of second language speech-in-noise

Seung Eun Kim*, Bronya R. Chernyak, Olga Seleznova, Joseph Keshet, Matthew Goldrick, Ann R. Bradlow

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Measuring how well human listeners recognize speech under varying environmental conditions (speech intelligibility) is a challenge for theoretical, technological, and clinical approaches to speech communication. The current gold standard—human transcription—is time- and resource-intensive. Recent advances in automatic speech recognition (ASR) systems raise the possibility of automating intelligibility measurement. This study tested 4 state-of-the-art ASR systems with second language speech-in-noise and found that one, whisper, performed at or above human listener accuracy. However, the content of whisper's responses diverged substantially from human responses, especially at lower signal-to-noise ratios, suggesting both opportunities and limitations for ASR-based speech intelligibility modeling.

Original languageEnglish (US)
Article number025204
JournalJASA Express Letters
Volume4
Issue number2
DOIs
StatePublished - Feb 1 2024

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Music
  • Arts and Humanities (miscellaneous)

Fingerprint

Dive into the research topics of 'Automatic recognition of second language speech-in-noise'. Together they form a unique fingerprint.

Cite this