Audiovisual Fusion: Challenges and New Approaches

Aggelos K. Katsaggelos, Sara Bahaadini, Rafael Molina

Research output: Contribution to journalArticlepeer-review

111 Scopus citations

Abstract

In this paper, we review recent results on audiovisual (AV) fusion. We also discuss some of the challenges and report on approaches to address them. One important issue in AV fusion is how the modalities interact and influence each other. This review will address this question in the context of AV speech processing, and especially speech recognition, where one of the issues is that the modalities both interact but also sometimes appear to desynchronize from each other. An additional issue that sometimes arises is that one of the modalities may be missing at test time, although it is available at training time; for example, it may be possible to collect AV training data while only having access to audio at test time. We will review approaches to address this issue from the area of multiview learning, where the goal is to learn a model or representation for each of the modalities separately while taking advantage of the rich multimodal training data. In addition to multiview learning, we also discuss the recent application of deep learning (DL) toward AV fusion. We finally draw conclusions and offer our assessment of the future in the area of AV fusion.

Original languageEnglish (US)
Article number7194741
Pages (from-to)1635-1653
Number of pages19
JournalProceedings of the IEEE
Volume103
Issue number9
DOIs
StatePublished - Sep 1 2015

Keywords

  • Audiovisual (AV) fusion
  • deep learning (DL)
  • machine learning
  • multimodal analysis
  • multiview learning
  • stream asynchrony

ASJC Scopus subject areas

  • General Computer Science
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Audiovisual Fusion: Challenges and New Approaches'. Together they form a unique fingerprint.

Cite this