TY - GEN
T1 - Vowel duration measurement using deep neural networks
AU - Adi, Yossi
AU - Keshet, Joseph
AU - Goldrick, Matthew
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/11/10
Y1 - 2015/11/10
N2 - Vowel durations are most often utilized in studies addressing specific issues in phonetics. Thus far this has been hampered by a reliance on subjective, labor-intensive manual annotation. Our goal is to build an algorithm for automatic accurate measurement of vowel duration, where the input to the algorithm is a speech segment contains one vowel preceded and followed by consonants (CVC). Our algorithm is based on a deep neural network trained at the frame level on manually annotated data from a phonetic study. Specifically, we try two deep-network architectures: convolutional neural network (CNN), and deep belief network (DBN), and compare their accuracy to an HMM-based forced aligner. Results suggest that CNN is better than DBN, and both CNN and HMM-based forced aligner are comparable in their results, but neither of them yielded the same predictions as models fit to manually annotated data.
AB - Vowel durations are most often utilized in studies addressing specific issues in phonetics. Thus far this has been hampered by a reliance on subjective, labor-intensive manual annotation. Our goal is to build an algorithm for automatic accurate measurement of vowel duration, where the input to the algorithm is a speech segment contains one vowel preceded and followed by consonants (CVC). Our algorithm is based on a deep neural network trained at the frame level on manually annotated data from a phonetic study. Specifically, we try two deep-network architectures: convolutional neural network (CNN), and deep belief network (DBN), and compare their accuracy to an HMM-based forced aligner. Results suggest that CNN is better than DBN, and both CNN and HMM-based forced aligner are comparable in their results, but neither of them yielded the same predictions as models fit to manually annotated data.
KW - Forced alignment
KW - convolution neural networks
KW - deep belief networks
KW - hidden Markov models
KW - vowel duration measurement
UR - http://www.scopus.com/inward/record.url?scp=84960910537&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84960910537&partnerID=8YFLogxK
U2 - 10.1109/MLSP.2015.7324331
DO - 10.1109/MLSP.2015.7324331
M3 - Conference contribution
C2 - 29034132
AN - SCOPUS:84960910537
T3 - IEEE International Workshop on Machine Learning for Signal Processing, MLSP
BT - 2015 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2015
A2 - Erdogmus, Deniz
A2 - Kozat, Serdar
A2 - Larsen, Jan
A2 - Akcakaya, Murat
PB - IEEE Computer Society
T2 - 25th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2015
Y2 - 17 September 2015 through 20 September 2015
ER -