TY - GEN
T1 - Context Modeling for Cross-Corpus Dimensional Acoustic Emotion Recognition
T2 - 20th International Conference on Speech and Computer, SPECOM 2018
AU - Fedotov, Dmitrii
AU - Kaya, Heysem
AU - Karpov, Alexey
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Recently, focus of research in the field of affective computing was shifted to spontaneous interactions and time-continuous annotations. Such data enlarge the possibility for real-world emotion recognition in the wild, but also introduce new challenges. Affective computing is a research area, where data collection is not a trivial and cheap task; therefore it would be rational to use all the data available. However, due to the subjective nature of emotions, differences in cultural and linguistic features as well as environmental conditions, combining affective speech data is not a straightforward process. In this paper, we analyze difficulties of automatic emotion recognition in time-continuous, dimensional scenario using data from RECOLA, SEMAINE and CreativeIT databases. We propose to employ a simple but effective strategy called “mixup” to overcome the gap in feature-target and target-target covariance structures across corpora. We showcase the performance of our system in three different cross-corpus experimental setups: single-corpus training, two-corpora training and training on augmented (mixed up) data. Findings show that the prediction behavior of trained models heavily depends on the covariance structure of the training corpus, and mixup is very effective in improving cross-corpus acoustic emotion recognition performance of context dependent LSTM models.
AB - Recently, focus of research in the field of affective computing was shifted to spontaneous interactions and time-continuous annotations. Such data enlarge the possibility for real-world emotion recognition in the wild, but also introduce new challenges. Affective computing is a research area, where data collection is not a trivial and cheap task; therefore it would be rational to use all the data available. However, due to the subjective nature of emotions, differences in cultural and linguistic features as well as environmental conditions, combining affective speech data is not a straightforward process. In this paper, we analyze difficulties of automatic emotion recognition in time-continuous, dimensional scenario using data from RECOLA, SEMAINE and CreativeIT databases. We propose to employ a simple but effective strategy called “mixup” to overcome the gap in feature-target and target-target covariance structures across corpora. We showcase the performance of our system in three different cross-corpus experimental setups: single-corpus training, two-corpora training and training on augmented (mixed up) data. Findings show that the prediction behavior of trained models heavily depends on the covariance structure of the training corpus, and mixup is very effective in improving cross-corpus acoustic emotion recognition performance of context dependent LSTM models.
KW - Cross-corpus emotion recognition
KW - Data augmentation
KW - Time-continuous emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85053808792&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-99579-3_17
DO - 10.1007/978-3-319-99579-3_17
M3 - Conference contribution
AN - SCOPUS:85053808792
SN - 9783319995786
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 155
EP - 165
BT - Speech and Computer - 20th International Conference, SPECOM 2018, Proceedings
A2 - Potapova, Rodmonga
A2 - Jokisch, Oliver
A2 - Karpov, Alexey
PB - Springer
Y2 - 18 September 2018 through 22 September 2018
ER -