Voice Quality Dependent Speech Recognition

Tae Jin Yoon, Xiaodan Zhuang, Jennifer Sandra Cole, Mark Hasegawa-Johnson

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Voice quality conveys both linguistic and paralinguistic information, and can be distinguished by acoustic source characteristics. We label objective voice quality categories based on the spectral and temporal structure of speech sounds, specifically the harmonic structure (H1-H2) and the mean autocorrelation ratio of each phone. Results from a classification experiment using a Support Vector Machine (SVM) classifier show that allophones that differ from each other regarding voice quality can be classified as distinct using input features in speech recognition. Among different possible ways to incorporate voice quality information in speech recognition, we demonstrate that by explicitly modeling voice quality variance in the acoustic phone models using hidden Markov modeling, we can improve word recognition accuracy.
Original languageEnglish (US)
Title of host publicationLinguistic Patterns of Spontaneous Speech
EditorsS Tseng
Place of PublicationTaipei, Taiwan
PublisherAcademica Sinica
Pages77-100
Number of pages24
StatePublished - 2009

Fingerprint Dive into the research topics of 'Voice Quality Dependent Speech Recognition'. Together they form a unique fingerprint.

Cite this