ETRO VUB
About ETRO  |  News  |  Events  |  Vacancies  |  Contact  
Home Research Education Industry Publications About ETRO

ETRO Publications

Full Details

Journal Publication

An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech

This publication appears in: Speech Communication

Authors: M. Shami and W. Verhelst

Volume: 49

Pages: 201-212

Publication Date: Mar. 2007


Abstract:

In this study, the robustness of approaches to the automatic classification of emotions in speech is addressed. Among the many types of emotions that exist, two groups of emotions are considered, adult to adult acted vocal expressions of common types of emotions like happiness, sadness, and anger and adult-to-infant vocal expressions of affective intents also known as motherese. Specifically, we estimate the generalization capability of two feature extraction approaches, the approach developed for Sony s robotic dog AIBO (AIBO) and the segment-based approach (SBA) of [Shami, M., Kamel, M., 2005. Segment based approach to the recognition of emotions in speech. In: IEEE Conf. on Multimedia and Expo (ICME05), Amsterdam, The Netherlands]. Three machine learning approaches are considered, K-nearest neighbors (KNN), Support vector machines (SVM) and Ada boosted decision trees and four emotional speech databases are employed, Kismet, BabyEars, Danish, and Berlin databases. Single corpus experiments show that the considered feature extraction approaches AIBO and SBA are competitive on the four databases considered and that their performance is comparable with previously published results on the same databases. The best choice of machine learning algorithm seems to depend on the feature extraction approach considered. Multi corpus experiments are performed with the Kismet BabyEars and the Danish Berlin database pairs that contain parallel emotional classes. Automatic clustering of the emotional classes in the database pairs shows that the patterns behind the emotions in the Kismet BabyEars pair are less database dependent than the patterns in the Danish Berlin pair. In off corpus testing the classifier is trained on one database of a pair and tested on the other. This provides little improvement over baseline classification. In integrated corpus testing, however, the classifier is machine learned on the merged databases and this gives promisingly robust classification results, which suggest that emotional corpora with parallel emotion classes recorded under different conditions can be used to construct a single classifier capable of distinguishing the emotions in the merged corpora. Such a classifier is more robust than a classifier learned on a single corpus as it can recognize more varied expressions of the same emotional classes. These findings suggest that the existing approaches for the classification of emotions in speech are efficient enough to handle larger amounts of training data without any reduction in classification accuracy.

External Link.

Other Reference Styles
Other Publications

• Journal publications

IRIS • LAMI • AVSP

• Conference publications

IRIS • LAMI • AVSP

• Book publications

IRIS • LAMI • AVSP

• Reports

IRIS • LAMI • AVSP

• Laymen publications

IRIS • LAMI • AVSP

• PhD Theses

Search ETRO Publications

Author:

Keyword:  

Type:








- Contact person

- IRIS

- AVSP

- LAMI

- Contact person

- Thesis proposals

- ETRO Courses

- Contact person

- Spin-offs

- Know How

- Journals

- Conferences

- Books

- Vacancies

- News

- Events

- Press

Contact

ETRO Department

Tel: +32 2 629 29 30

©2024 • Vrije Universiteit Brussel • ETRO Dept. • Pleinlaan 2 • 1050 Brussels • Tel: +32 2 629 2930 (secretariat) • Fax: +32 2 629 2883 • WebmasterDisclaimer