Enhancing analysis of diadochokinetic speech using deep neural networks

Yael Segal-Feldman*, Kasia Hitczenko, Matthew Goldrick, Adam Buchwald, Angela Roberts, Joseph Keshet

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Diadochokinetic speech tasks (DDK) involve the repetitive production of consonant-vowel syllables. These tasks are useful in detecting impairments, differential diagnosis, and monitoring progress in speech-motor impairments. However, manual analysis of those tasks is time-consuming, subjective, and provides only a rough picture of speech. This paper presents several deep neural network models working on the raw waveform for the automatic segmentation of stop consonants and vowels from unannotated and untranscribed speech. A deep encoder serves as a features extractor module, replacing conventional signal processing features. In this context, diverse deep learning architectures, such as convolutional neural networks (CNNs) and large self-supervised models like HuBERT, are applied for the extraction process. A decoder model uses derived embeddings to identify frame types. Consequently, the paper studies diverse deep architectures, ranging from linear layers, LSTM, CNN, and transformers. These architectures are assessed for their ability to detect speech rate, sound duration, and boundary locations on a dataset of healthy individuals and an unseen dataset of older individuals with Parkinson's Disease. The results reveal that an LSTM model performs better than all other models on both datasets and is comparable to trained human annotators.

Original languageEnglish (US)
Article number101715
JournalComputer Speech and Language
Volume90
DOIs
StatePublished - Mar 2025

Funding

This work is supported by the Ministry of Science & Technology, Israel (Y. Segal-Feldman); U.S. National Institutes of Health (NIH; grants R21MH119677, K01DC014298, R01DC018589; R01MH134369); U.S. National Science Foundation (NSF; grant DRL2219843); the Binational Science Foundation (BSF; grant 2022618); and the Ontario Brain Institute, Canada (an independent non-profit corporation, funded partially by the Ontario government). Matching funds provided by participating hospitals, the Baycrest Foundation, Bruy\u00E8re Research Institute, Centre for Addiction and Mental Health Foundation, Canada, London Health Sciences Foundation, McMaster University Faculty of Health Sciences, Ottawa Brain and Mind Research Institute, Queen's University Faculty of Health Sciences, Canada, the Sunnybrook Foundation, the Thunder Bay Regional Health Sciences Centre, the University of Ottawa Faculty of Medicine, and the Windsor/Essex County ALS Association. The Temerty Family Foundation provided infrastructure matching funds. The opinions, results, and conclusions are those of the authors, and no endorsement by the Ontario Brain Institute or NIH is intended or should be inferred. Thanks to Hung-Shao Cheng, Rosemary Dong, Katerina Alexopoulos, Camila Hirani, and Jasmine Tran for their help in data collection and processing. The authors also acknowledge ONDRI's clinical coordinators, including Catarina Downey, Heather Hink, Donna McBain, Lindsey McLeish, and Alicia J. Peltsch. This work is supported by the Ministry of Science & Technology, Israel (Y. Segal-Feldman); U.S. National Institutes of Health (NIH; grants R21MH119677 , K01DC014298 , R01DC018589 ; R01MH134369 ); U.S. National Science Foundation (NSF; grant DRL2219843 ); the Binational Science Foundation (BSF; grant 2022618 ); and the Ontario Brain Institute (an independent non-profit corporation, funded partially by the Ontario government). Matching funds provided by participating hospitals, the Baycrest Foundation , Bruy\u00E8re Research Institute , Centre for Addiction and Mental Health Foundation , London Health Sciences Foundation , McMaster University Faculty of Health Sciences, Ottawa Brain and Mind Research Institute , Queen\u2019s University Faculty of Health Sciences , the Sunnybrook Foundation , the Thunder Bay Regional Health Sciences Centre , the University of Ottawa Faculty of Medicine , and the Windsor/Essex County ALS Association . The Temerty Family Foundation provided infrastructure matching funds. The opinions, results, and conclusions are those of the authors, and no endorsement by the Ontario Brain Institute or NIH is intended or should be inferred. Thanks to Hung-Shao Cheng, Rosemary Dong, Katerina Alexopoulos, Camila Hirani, and Jasmine Tran for their help in data collection and processing. The authors also acknowledge ONDRI\u2019s clinical coordinators, including Catarina Downey, Heather Hink, Donna McBain, Lindsey McLeish, and Alicia J. Peltsch.

Keywords

  • DDK
  • Deep neural networks
  • Diadochokinetic speech
  • Parkinson's Disease
  • Voice onset time
  • Vowel duration

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Enhancing analysis of diadochokinetic speech using deep neural networks'. Together they form a unique fingerprint.

Cite this