Speech-to-video synthesis using facial animation parameters

Petar S. Aleksic*, Aggelos K Katsaggelos

*Corresponding author for this work

Research output: Contribution to conferencePaper

4 Citations (Scopus)

Abstract

The presence of visual information in addition to audio could improve speech understanding in noisy environments. This additional information could be especially useful for people with impaired hearing who are able to speechread. This paper focuses on the problem of synthesizing the Facial Animation Parameters (FAPs), supported by the MPEG-4 standard for the visual representation of speech, from a narrowband acoustic speech (telephone) signal. A correlation Hidden Markov Model (CHMM) system for performing visual speech synthesis is proposed. The CHMM system integrates an independently trained acoustic HMM (AHMM) system and a visual HMM (VHMM) system, in order to realize speech-to-video synthesis. Objective experiments are performed by analyzing the synthesized FAPs and computing the time alignment errors. Time alignment errors are reduced by 40.5% compared to the conventional temporal scaling method.

Original languageEnglish (US)
Pages1-4
Number of pages4
StatePublished - Dec 17 2003
EventProceedings: 2003 International Conference on Image Processing, ICIP-2003 - Barcelona, Spain
Duration: Sep 14 2003Sep 17 2003

Other

OtherProceedings: 2003 International Conference on Image Processing, ICIP-2003
CountrySpain
CityBarcelona
Period9/14/039/17/03

Fingerprint

Animation
Hidden Markov models
Acoustics
Motion Picture Experts Group standards
Speech synthesis
Audition
Telephone
Experiments

ASJC Scopus subject areas

  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Cite this

Aleksic, P. S., & Katsaggelos, A. K. (2003). Speech-to-video synthesis using facial animation parameters. 1-4. Paper presented at Proceedings: 2003 International Conference on Image Processing, ICIP-2003, Barcelona, Spain.
Aleksic, Petar S. ; Katsaggelos, Aggelos K. / Speech-to-video synthesis using facial animation parameters. Paper presented at Proceedings: 2003 International Conference on Image Processing, ICIP-2003, Barcelona, Spain.4 p.
@conference{45e598235bb949efacf1cedeb3dbfc45,
title = "Speech-to-video synthesis using facial animation parameters",
abstract = "The presence of visual information in addition to audio could improve speech understanding in noisy environments. This additional information could be especially useful for people with impaired hearing who are able to speechread. This paper focuses on the problem of synthesizing the Facial Animation Parameters (FAPs), supported by the MPEG-4 standard for the visual representation of speech, from a narrowband acoustic speech (telephone) signal. A correlation Hidden Markov Model (CHMM) system for performing visual speech synthesis is proposed. The CHMM system integrates an independently trained acoustic HMM (AHMM) system and a visual HMM (VHMM) system, in order to realize speech-to-video synthesis. Objective experiments are performed by analyzing the synthesized FAPs and computing the time alignment errors. Time alignment errors are reduced by 40.5{\%} compared to the conventional temporal scaling method.",
author = "Aleksic, {Petar S.} and Katsaggelos, {Aggelos K}",
year = "2003",
month = "12",
day = "17",
language = "English (US)",
pages = "1--4",
note = "Proceedings: 2003 International Conference on Image Processing, ICIP-2003 ; Conference date: 14-09-2003 Through 17-09-2003",

}

Aleksic, PS & Katsaggelos, AK 2003, 'Speech-to-video synthesis using facial animation parameters' Paper presented at Proceedings: 2003 International Conference on Image Processing, ICIP-2003, Barcelona, Spain, 9/14/03 - 9/17/03, pp. 1-4.

Speech-to-video synthesis using facial animation parameters. / Aleksic, Petar S.; Katsaggelos, Aggelos K.

2003. 1-4 Paper presented at Proceedings: 2003 International Conference on Image Processing, ICIP-2003, Barcelona, Spain.

Research output: Contribution to conferencePaper

TY - CONF

T1 - Speech-to-video synthesis using facial animation parameters

AU - Aleksic, Petar S.

AU - Katsaggelos, Aggelos K

PY - 2003/12/17

Y1 - 2003/12/17

N2 - The presence of visual information in addition to audio could improve speech understanding in noisy environments. This additional information could be especially useful for people with impaired hearing who are able to speechread. This paper focuses on the problem of synthesizing the Facial Animation Parameters (FAPs), supported by the MPEG-4 standard for the visual representation of speech, from a narrowband acoustic speech (telephone) signal. A correlation Hidden Markov Model (CHMM) system for performing visual speech synthesis is proposed. The CHMM system integrates an independently trained acoustic HMM (AHMM) system and a visual HMM (VHMM) system, in order to realize speech-to-video synthesis. Objective experiments are performed by analyzing the synthesized FAPs and computing the time alignment errors. Time alignment errors are reduced by 40.5% compared to the conventional temporal scaling method.

AB - The presence of visual information in addition to audio could improve speech understanding in noisy environments. This additional information could be especially useful for people with impaired hearing who are able to speechread. This paper focuses on the problem of synthesizing the Facial Animation Parameters (FAPs), supported by the MPEG-4 standard for the visual representation of speech, from a narrowband acoustic speech (telephone) signal. A correlation Hidden Markov Model (CHMM) system for performing visual speech synthesis is proposed. The CHMM system integrates an independently trained acoustic HMM (AHMM) system and a visual HMM (VHMM) system, in order to realize speech-to-video synthesis. Objective experiments are performed by analyzing the synthesized FAPs and computing the time alignment errors. Time alignment errors are reduced by 40.5% compared to the conventional temporal scaling method.

UR - http://www.scopus.com/inward/record.url?scp=0345134263&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0345134263&partnerID=8YFLogxK

M3 - Paper

SP - 1

EP - 4

ER -

Aleksic PS, Katsaggelos AK. Speech-to-video synthesis using facial animation parameters. 2003. Paper presented at Proceedings: 2003 International Conference on Image Processing, ICIP-2003, Barcelona, Spain.