TY - GEN
T1 - Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition
AU - Terry, Louis H.
AU - Shiell, Derek J.
AU - Katsaggelos, Aggelos K.
PY - 2008
Y1 - 2008
N2 - Most current audio-visual automatic speech recognition (AV-ASR) systems use static weights to leverage between audio and visual information during information fusion. State of the art research has led to using audio reliability metrics for dynamically changing the fusion weights in order to successfully improve overall recognition results. So far, however, incorporating visual reliability metrics into these audio reliability metric based systems have not significantly improved performance. We introduce a new approach to this problem by inferring the "consistency" between the audio and visual information and leveraging the existing audio reliability metrics to create a video reliability metric. Our approach is formulated in the extractedfeature space and, thus, does not rely on analyzing the actual video signalitself. The framework presented in this work competes with the audio-onlyreliability metric based systems and shows promise to consistently outperform.
AB - Most current audio-visual automatic speech recognition (AV-ASR) systems use static weights to leverage between audio and visual information during information fusion. State of the art research has led to using audio reliability metrics for dynamically changing the fusion weights in order to successfully improve overall recognition results. So far, however, incorporating visual reliability metrics into these audio reliability metric based systems have not significantly improved performance. We introduce a new approach to this problem by inferring the "consistency" between the audio and visual information and leveraging the existing audio reliability metrics to create a video reliability metric. Our approach is formulated in the extractedfeature space and, thus, does not rely on analyzing the actual video signalitself. The framework presented in this work competes with the audio-onlyreliability metric based systems and shows promise to consistently outperform.
KW - Hidden Markov models
KW - Speech recognition
KW - Vector quantization
UR - http://www.scopus.com/inward/record.url?scp=69949118452&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=69949118452&partnerID=8YFLogxK
U2 - 10.1109/ICIP.2008.4712005
DO - 10.1109/ICIP.2008.4712005
M3 - Conference contribution
AN - SCOPUS:69949118452
SN - 1424417643
SN - 9781424417643
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 1316
EP - 1319
BT - 2008 IEEE International Conference on Image Processing, ICIP 2008 Proceedings
T2 - 2008 IEEE International Conference on Image Processing, ICIP 2008
Y2 - 12 October 2008 through 15 October 2008
ER -