Vector quantization with memory and multi-labeling for isolated video-only automatic speech recognition

Louis H. Terry, Derek J. Shiell, Aggelos K. Katsaggelos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe a vector quantizer (VQ) with memory for automatic speech recognition (ASR) and compare the recognition performance results to those obtained with traditional mem-oryless VQ for ASR. Standard VQ for ASR quantizes the speech data independently of any past information. We introduce memory in a probabilistic framework for quantization state modeling. This is accomplished in the form of an ergodic hidden Markov model (HMM) in which the state occupied by the HMM represents the quantization label. We evaluate this approach in the context of video-only isolated digit ASR and implement both single stream (single labeling) and multi-stream (multi-labeling) systems. For single stream recognition, our approach increases the recognition rate from 62.67% to 66.95%. When using multi-labeling, our proposed vector quantizer with memory consistently outperforms the memoryless vector quantizer.

Original languageEnglish (US)
Title of host publication2008 IEEE International Conference on Image Processing, ICIP 2008 Proceedings
Pages1320-1323
Number of pages4
DOIs
StatePublished - 2008
Event2008 IEEE International Conference on Image Processing, ICIP 2008 - San Diego, CA, United States
Duration: Oct 12 2008Oct 15 2008

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Other

Other2008 IEEE International Conference on Image Processing, ICIP 2008
Country/TerritoryUnited States
CitySan Diego, CA
Period10/12/0810/15/08

Keywords

  • Hidden Markov models
  • Speech recognition
  • Vector quantization

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Signal Processing

Fingerprint

Dive into the research topics of 'Vector quantization with memory and multi-labeling for isolated video-only automatic speech recognition'. Together they form a unique fingerprint.

Cite this