Abstract
Voice quality conveys both linguistic and paralinguistic information, and can be distinguished by acoustic source characteristics. We label objective voice quality categories based on the spectral and temporal structure of speech sounds, specifically the harmonic structure (H1-H2) and the mean autocorrelation ratio of each phone. Results from a classification experiment using a Support Vector Machine (SVM) classifier show that allophones that differ from each other regarding voice quality can be classified as distinct using input features in speech recognition. Among different possible ways to incorporate voice quality information in speech recognition, we demonstrate that by explicitly modeling voice quality variance in the acoustic phone models using hidden Markov modeling, we can improve word recognition accuracy.
Original language | English (US) |
---|---|
Title of host publication | Linguistic Patterns of Spontaneous Speech |
Editors | S Tseng |
Place of Publication | Taipei, Taiwan |
Publisher | Academica Sinica |
Pages | 77-100 |
Number of pages | 24 |
State | Published - 2009 |