Abstract
An important research direction in speech technology is robust cross-corpus and cross-language emotion recognition. In this paper, we propose computationally efficient and performance effective feature normalization strategies for the challenging task of cross-corpus acoustic emotion recognition. We particularly deploy a cascaded normalization approach, combining linear speaker level, nonlinear value level and feature vector level normalization to minimize speaker- and corpus-related effects as well as to maximize class separability with linear kernel classifiers. We use extreme learning machine classifiers on five corpora representing five languages from different families, namely Danish, English, German, Russian and Turkish. Using a standard set of suprasegmental features, the proposed normalization strategies show superior performance compared to benchmark normalization approaches commonly used in the literature.
Original language | English |
---|---|
Pages (from-to) | 1028-1034 |
Number of pages | 7 |
Journal | Neurocomputing |
Volume | 275 |
DOIs | |
Publication status | Published - 31 Jan 2018 |
Funding
This research is partially supported by the Russian Foundation for Basic Research (project № 16-37-60100 ) and by the Council for Grants of the President of Russia (project № MD-254.2017.8).
Keywords
- Acoustic emotion recognition
- Cross-corpus adaptation
- Extreme learning machines