Vowel duration measurement using deep neural networks

Yossi Adi, Joseph Keshet, Matthew Goldrick

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

Vowel durations are most often utilized in studies addressing specific issues in phonetics. Thus far this has been hampered by a reliance on subjective, labor-intensive manual annotation. Our goal is to build an algorithm for automatic accurate measurement of vowel duration, where the input to the algorithm is a speech segment contains one vowel preceded and followed by consonants (CVC). Our algorithm is based on a deep neural network trained at the frame level on manually annotated data from a phonetic study. Specifically, we try two deep-network architectures: convolutional neural network (CNN), and deep belief network (DBN), and compare their accuracy to an HMM-based forced aligner. Results suggest that CNN is better than DBN, and both CNN and HMM-based forced aligner are comparable in their results, but neither of them yielded the same predictions as models fit to manually annotated data.

Original languageEnglish (US)
Title of host publication2015 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2015
EditorsDeniz Erdogmus, Serdar Kozat, Jan Larsen, Murat Akcakaya
PublisherIEEE Computer Society
ISBN (Electronic)9781467374545
DOIs
StatePublished - Nov 10 2015
Event25th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2015 - Boston, United States
Duration: Sep 17 2015Sep 20 2015

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
Volume2015-November
ISSN (Print)2161-0363
ISSN (Electronic)2161-0371

Other

Other25th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2015
CountryUnited States
CityBoston
Period9/17/159/20/15

Keywords

  • Forced alignment
  • convolution neural networks
  • deep belief networks
  • hidden Markov models
  • vowel duration measurement

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing

Fingerprint Dive into the research topics of 'Vowel duration measurement using deep neural networks'. Together they form a unique fingerprint.

  • Cite this

    Adi, Y., Keshet, J., & Goldrick, M. (2015). Vowel duration measurement using deep neural networks. In D. Erdogmus, S. Kozat, J. Larsen, & M. Akcakaya (Eds.), 2015 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2015 [7324331] (IEEE International Workshop on Machine Learning for Signal Processing, MLSP; Vol. 2015-November). IEEE Computer Society. https://doi.org/10.1109/MLSP.2015.7324331