Automatic recognition of second language speech-in-noise

Seung Eun Kim*, Bronya R. Chernyak, Olga Seleznova, Joseph Keshet, Matthew Goldrick, Ann R. Bradlow

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Measuring how well human listeners recognize speech under varying environmental conditions (speech intelligibility) is a challenge for theoretical, technological, and clinical approaches to speech communication. The current gold standard—human transcription—is time- and resource-intensive. Recent advances in automatic speech recognition (ASR) systems raise the possibility of automating intelligibility measurement. This study tested 4 state-of-the-art ASR systems with second language speech-in-noise and found that one, whisper, performed at or above human listener accuracy. However, the content of whisper's responses diverged substantially from human responses, especially at lower signal-to-noise ratios, suggesting both opportunities and limitations for ASR-based speech intelligibility modeling.

Original languageEnglish (US)
Article number025204
JournalJASA Express Letters
Volume4
Issue number2
DOIs
StatePublished - Feb 1 2024

Funding

This work was supported by NSF DRL Grant No. 2219843 and BSF Grant No. 2022618. Thanks to Chun Chan for assistance with human data collection.

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Music
  • Arts and Humanities (miscellaneous)

Fingerprint

Dive into the research topics of 'Automatic recognition of second language speech-in-noise'. Together they form a unique fingerprint.

Cite this