Efficient Stuttering Event Detection Using Siamese Networks

Payal Mohapatra*, Bashima Islam, Md Tamzeed Islam, Ruochen Jiao, Qi Zhu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Speech disfluency research is pivotal to accommodating atypical speakers in mainstream conversational technology. However, the lack of publicly available labeled and unlabeled datasets is a significant bottleneck to such research. While many works use pseudo dysfluency data with proxy labels and formulate a self-supervised task, we see merit in using real-world data. In this work, we consolidate the corpora of publicly available speech disfluency datasets with and without labels and propose DisfluentSiam - an efficient siamese network-based small-scale pretraining pipeline using task-specific data from multiple domains with only 10M trainable parameters. We show that with DisfluentSiam, we achieve an average of 15% boost in performance across five types of dysfluency event detection compared to direct wav2vec 2.0 embeddings. In particular, with only 4-5 mins of labeled data for fine-tuning, the DisfluentSiam demonstrates the advantage of task-specific pretraining with up to 25% higher accuracy.

Original languageEnglish (US)
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728163277
DOIs
StatePublished - 2023
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
Duration: Jun 4 2023Jun 10 2023

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2023-June
ISSN (Print)1520-6149

Conference

Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period6/4/236/10/23

Keywords

  • Dysfluency
  • Self-supervised Learning

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Efficient Stuttering Event Detection Using Siamese Networks'. Together they form a unique fingerprint.

Cite this