Abstract
Emotions occur in complex social interactions, and thus processing of isolated utterances may not be sufficient to grasp the nature of underlying emotional states. Dialog speech provides useful information about context that explains nuances of emotions and their transitions. Context can be defined on different levels; this paper proposes a hierarchical context modelling approach based on RNN-LSTM architecture, which models acoustical context on the frame level and partner's emotional context on the dialog level. The method is proved effective together with cross-corpus training setup and domain adaptation technique in a set of speaker independent cross-validation experiments on IEMOCAP corpus for three levels of activation and valence classification. As a result, the state-of-the-art on this corpus is advanced for both dimensions using only acoustic modality.
Original language | English |
---|---|
Title of host publication | 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings |
Publisher | IEEE |
Pages | 6700-6704 |
Number of pages | 5 |
Volume | 2019-May |
ISBN (Electronic) | 9781479981311 |
DOIs | |
Publication status | Published - 1 May 2019 |
Externally published | Yes |
Event | 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, United Kingdom Duration: 12 May 2019 → 17 May 2019 |
Conference
Conference | 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 |
---|---|
Country/Territory | United Kingdom |
City | Brighton |
Period | 12/05/19 → 17/05/19 |
Funding
Acknowledgements. The study is supported by the Russian Science Foundation (project No. 18-11-00145) and Huawei Innovation Research Program.
Keywords
- context modelling
- cross-corpus
- dialog systems
- Emotion recognition
- LSTM