Abstract
Acoustic emotion recognition is a popular and central research direction in paralinguistic analysis, due its relation to a wide range of affective states/traits and manifold applications. Developing highly generalizable models still remains as a challenge for researchers and engineers, because of multitude of nuisance factors. To assert generalization, deployed models need to handle spontaneous speech recorded under different acoustic conditions compared to the training set. This requires that the models are tested for cross-corpus robustness. In this work, we first investigate the suitability of Long-Short-Term-Memory (LSTM) models trained with time- and space-continuously annotated affective primitives for cross-corpus acoustic emotion recognition. We next employ an effective approach to use the frame level valence and arousal predictions of LSTM models for utterance level affect classification and apply this approach on the ComParE 2018 challenge corpora. The proposed method alone gives motivating results both on development and test set of the Self-Assessed Affect Sub-Challenge. On the development set, the cross-corpus prediction based method gives a boost to performance when fused with top components of the baseline system. Results indicate the suitability of the proposed method for both time-continuous and utterance level cross-corpus acoustic emotion recognition tasks.
Original language | English |
---|---|
Title of host publication | INTERSPEECH-2018 |
Pages | 521-525 |
Number of pages | 5 |
Volume | 2018-September |
DOIs | |
Publication status | Published - 1 Sept 2018 |
Event | 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India Duration: 2 Sept 2018 → 6 Sept 2018 |
Conference
Conference | 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 |
---|---|
Country/Territory | India |
City | Hyderabad |
Period | 2/09/18 → 6/09/18 |
Funding
The participation in the ComParE 2018 challenge with experiments on USoMS corpus (Section 4) was supported exclusively by the Russian Science Foundation (Project No. 18-11-00145). The rest research was supported by the Huawei Innovation Research Program (Agreement No. HO2017050001BM).
Keywords
- Computational paralinguistics
- Context modeling
- Cross-corpus emotion recognition
- LSTM
- Speech emotion recognition