ETRO VUB
About ETRO  |  News  |  Events  |  Vacancies  |  Contact  
Home Research Education Industry Publications About ETRO

ETRO Publications

Full Details

Conference Publication

Realistic Visual Speech Synthesis Based on AAM Features and an Articulatory DBN Model with Constrained Asynchrony

Authors: P. Wu, D. Jiang, Z. He and H. Sahli

Publication Year: 2011

Pages: 59-64


Abstract:

This paper presents a photo realistic visual speech synthesis method based on an audio visual articulatory dynamic Bayesian network model (AF_AVDBN) in which the maximum asynchronies between the articulatory features, such as lips, tongue and glottis/velum, can be controlled. Perceptual linear prediction (PLP) features from the audio speech and active appearance model (AAM) features from mouth images of the visual speech are adopted to train the AF_AVDBN model for continuous speech. An EM-based optimal visual feature learning algorithm is deduced given the input auditory speech and the trained AF_AVDBN parameters. Finally, photo realistic mouth images are synthesized from the learned AAM features. In the experiments, mouth animations are synthesized for 30 connected digit audio speech sentences. Objective evaluation results show that the learned visual features using AF_AVDBN track the real parameters much more closely than those from the audio visual state synchronous DBN model (SS_DBN, the DBN implementation of multi-stream Hidden Markov Model), as well the state asynchronous DBN model (SA_DBN). Subjective evaluation results show that by considering the asynchronies between articulatory features in the AF_AVDBN (as well between audio and visual states in the SA_DBN), the synchronization between the audio speech and mouth animations are well obtained. Moreover, since AF_AVDBN captures the dynamic movements of articulatory features and model the pronunciation process more precisely, the accuracy of the mouth animations from the AF_AVDBN is much higher than those from the SA_DBN and the SS_DBN models, very accurate, clear, and natural mouth animations can be obtained through the AF_AVDBN model and AAM features.

Other Reference Styles
Current ETRO Authors

Prof. Hichem Sahli

+32 (0)02 629 291

hsahli@etrovub.be

more info

Other Publications

• Journal publications

IRIS • LAMI • AVSP

• Conference publications

IRIS • LAMI • AVSP

• Book publications

IRIS • LAMI • AVSP

• Reports

IRIS • LAMI • AVSP

• Laymen publications

IRIS • LAMI • AVSP

• PhD Theses

Search ETRO Publications

Author:

Keyword:  

Type:








- Contact person

- IRIS

- AVSP

- LAMI

- Contact person

- Thesis proposals

- ETRO Courses

- Contact person

- Spin-offs

- Know How

- Journals

- Conferences

- Books

- Vacancies

- News

- Events

- Press

Contact

ETRO Department

Tel: +32 2 629 29 30

©2024 • Vrije Universiteit Brussel • ETRO Dept. • Pleinlaan 2 • 1050 Brussels • Tel: +32 2 629 2930 (secretariat) • Fax: +32 2 629 2883 • WebmasterDisclaimer