Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition

Louis H. Terry, Derek J. Shiell, Aggelos K. Katsaggelos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Most current audio-visual automatic speech recognition (AV-ASR) systems use static weights to leverage between audio and visual information during information fusion. State of the art research has led to using audio reliability metrics for dynamically changing the fusion weights in order to successfully improve overall recognition results. So far, however, incorporating visual reliability metrics into these audio reliability metric based systems have not significantly improved performance. We introduce a new approach to this problem by inferring the "consistency" between the audio and visual information and leveraging the existing audio reliability metrics to create a video reliability metric. Our approach is formulated in the extractedfeature space and, thus, does not rely on analyzing the actual video signalitself. The framework presented in this work competes with the audio-onlyreliability metric based systems and shows promise to consistently outperform.

Original languageEnglish (US)
Title of host publication2008 IEEE International Conference on Image Processing, ICIP 2008 Proceedings
Pages1316-1319
Number of pages4
DOIs
StatePublished - 2008
Event2008 IEEE International Conference on Image Processing, ICIP 2008 - San Diego, CA, United States
Duration: Oct 12 2008Oct 15 2008

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Other

Other2008 IEEE International Conference on Image Processing, ICIP 2008
Country/TerritoryUnited States
CitySan Diego, CA
Period10/12/0810/15/08

Keywords

  • Hidden Markov models
  • Speech recognition
  • Vector quantization

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Signal Processing

Fingerprint

Dive into the research topics of 'Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition'. Together they form a unique fingerprint.

Cite this