Audio-visual and visual-only speech and speaker recognition: Issues about theory, system design, and implementation

Derek J. Shiell, Louis H. Terry, Petar S. Aleksic, Aggelos K. Katsaggelos

Research output: Chapter in Book/Report/Conference proceedingChapter

6 Scopus citations

Abstract

The information imbedded in the visual dynamics of speech has the potential to improve the performance of speech and speaker recognition systems. The information carried in the visual speech signal compliments the information in the acoustic speech signal, which is particularly beneficial in adverse acoustic environments. Non-invasive methods using low-cost sensors can be used to obtain acoustic and visual biometric signals, such as a person's voice and lip movement, with little user cooperation. These types of unobtrusive biometric systems are warranted to promote widespread adoption of biometric technology in today's society. In this chapter, the authors describe the main components and theory of audio-visual and visual-only speech and speaker recognition systems. Audio-visual corpora are described and a number of speech and speaker recognition systems are reviewed. Finally, various open issues about the system design and implementation, and present future research and development directions in this area are discussed.

Original languageEnglish (US)
Title of host publicationVisual Speech Recognition
Subtitle of host publicationLip Segmentation and Mapping
PublisherIGI Global
Pages1-38
Number of pages38
ISBN (Print)9781605661865
DOIs
StatePublished - 2009

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)

Fingerprint

Dive into the research topics of 'Audio-visual and visual-only speech and speaker recognition: Issues about theory, system design, and implementation'. Together they form a unique fingerprint.

Cite this