|
Audio Visual Emotion Recognition Based on Triple-Stream Dynamic Bayesian Network Models Host Publication: Affective Computing and Intelligent Interaction Authors: D. Jiang, Y. Cui, X. Zhang, P. Fan, I. Gonzalez and H. Sahli Publisher: Springer Publication Year: 2011 Number of Pages: 10 ISBN: 978-3-642-24599-2
Abstract: We present a triple stream DBN model (T_AsyDBN) for audio visual emotion recognition, in which the two audio feature streams are synchronous, while they are asynchronous with the visual feature stream within controllable constraints. MFCC features and the principle component analysis (PCA) coefficients of local prosodic features are used for the audio streams. For the visual stream, 2D facial features as well 3D facial animation unit features are defined and concatenated, and the feature dimensions are reduced by PCA. Emotion recognition experiments on the eNERFACEཁ database show that by adjusting the asynchrony constraint, the proposed T_AsyDBN model obtains 18.73% higher correction rate than the traditional multi-stream state synchronous HMM (MSHMM), and 10.21% higher than the two stream asynchronous DBN model (Asy_DBN). External Link.
|
|