Lip feature extraction and feature evaluation in the context of speech and speaker recognition

Petar S. Aleksic, Aggelos K. Katsaggelos

Research output: Chapter in Book/Report/Conference proceedingChapter

2 Scopus citations

Abstract

There has been significant work on investigating the relationship between articulatory movements and vocal tract shape and speech acoustics (Fant, 1960; Flanagan, 1965; Narayanan & Alwan, 2000; Schroeter & Sondhi, 1994). It has been shown that there exists a strong correlation between face motion, and vocal tract shape and speech acoustics (Grant & Braida, 1991; Massaro & Stork, 1998; Summerfield, 1979, 1987, 1992; Williams & Katsaggelos, 2002; Yehia, Rubin, & Vatikiotis-Bateson, 1998). In particular, dynamic lip information conveys not only correlated but also complimentary information to the acoustic speech information. Its integration into an automatic speech recognition (ASR) system, resulting in an audio-visual (AV) system, can potentially increase the system's performance. Although visual speech information is usually used together with acoustic information, there are applications where visual-only (V-only) ASR systems can be employed achieving high recognition rates. Such include small vocabulary ASR (digits, small number of commands, etc.) and ASR in the presence of adverse acoustic conditions. The choice and accurate extraction of visual features strongly affect the performance of AV and V-only ASR systems. The establishment of lip features for speech recognition is a relatively new research topic. Although a number of approaches can be used for extracting and representing visual lip information, unfortunately, limited work exists in the literature in comparing the relative performance of different features. In this chapter, the authors describe various approaches for extracting and representing important visual features, review existing systems, evaluate their relative performance in terms of speech and speaker recognition rates, and discuss future research and development directions in this area.

Original languageEnglish (US)
Title of host publicationVisual Speech Recognition
Subtitle of host publicationLip Segmentation and Mapping
PublisherIGI Global
Pages39-69
Number of pages31
ISBN (Print)9781605661865
DOIs
StatePublished - Dec 1 2009

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)

Fingerprint Dive into the research topics of 'Lip feature extraction and feature evaluation in the context of speech and speaker recognition'. Together they form a unique fingerprint.

Cite this