TY - JOUR
T1 - DDKtor
T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022
AU - Segal, Yael
AU - Hitczenko, Kasia
AU - Goldrick, Matthew
AU - Buchwald, Adam
AU - Roberts, Angela Christine
AU - Keshet, Joseph
N1 - Funding Information:
This work is supported by the Ministry of Science & Technology, Israel (Y. Segal); U.S. National Institutes of Health (NIH; grants R21MH119677, K01DC014298, R01DC018589); and the Ontario Brain Institute with matching funds provided by participating hospitals, the Windsor/Essex County ALS Association and the Temerty Family Foundation. The opinions, results, and conclusions are those of the authors and no endorsement by the Ontario Brain Institute or NIH is intended or should be inferred. Thanks to Hung-Shao Cheng, Rosemary Dong, Ka-terina Alexopoulos, Camila Hirani, and Jasmine Tran for help in data collection and processing.
Publisher Copyright:
Copyright © 2022 ISCA.
PY - 2022
Y1 - 2022
N2 - Diadochokinetic speech tasks (DDK), in which participants repeatedly produce syllables, are commonly used as part of the assessment of speech motor impairments. These studies rely on manual analyses that are time-intensive, subjective, and provide only a coarse-grained picture of speech. This paper presents two deep neural network models that automatically segment consonants and vowels from unannotated, untranscribed speech. Both models work on the raw waveform and use convolutional layers for feature extraction. The first model is based on an LSTM classifier followed by fully connected layers, while the second model adds more convolutional layers followed by fully connected layers. These segmentations predicted by the models are used to obtain measures of speech rate and sound duration. Results on a young healthy individuals dataset show that our LSTM model outperforms the current state-of-the-art systems and performs comparably to trained human annotators. Moreover, the LSTM model also presents comparable results to trained human annotators when evaluated on unseen older individuals with Parkinson's Disease dataset.
AB - Diadochokinetic speech tasks (DDK), in which participants repeatedly produce syllables, are commonly used as part of the assessment of speech motor impairments. These studies rely on manual analyses that are time-intensive, subjective, and provide only a coarse-grained picture of speech. This paper presents two deep neural network models that automatically segment consonants and vowels from unannotated, untranscribed speech. Both models work on the raw waveform and use convolutional layers for feature extraction. The first model is based on an LSTM classifier followed by fully connected layers, while the second model adds more convolutional layers followed by fully connected layers. These segmentations predicted by the models are used to obtain measures of speech rate and sound duration. Results on a young healthy individuals dataset show that our LSTM model outperforms the current state-of-the-art systems and performs comparably to trained human annotators. Moreover, the LSTM model also presents comparable results to trained human annotators when evaluated on unseen older individuals with Parkinson's Disease dataset.
KW - DDK
KW - Deep neural networks
KW - Diadochokinetic speech
KW - Parkinson's Disease
KW - Voice onset time
KW - Vowel duration
UR - http://www.scopus.com/inward/record.url?scp=85140082566&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140082566&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2022-311
DO - 10.21437/Interspeech.2022-311
M3 - Conference article
AN - SCOPUS:85140082566
SN - 2308-457X
VL - 2022-September
SP - 4611
EP - 4615
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 18 September 2022 through 22 September 2022
ER -