Hierarchical Two-level Modelling of Emotional States in Spoken Dialog Systems

Oxana Verkholyak, Dmitrii Fedotov, Heysem Kaya, Yang Zhang, Alexey Karpov

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Emotions occur in complex social interactions, and thus processing of isolated utterances may not be sufficient to grasp the nature of underlying emotional states. Dialog speech provides useful information about context that explains nuances of emotions and their transitions. Context can be defined on different levels; this paper proposes a hierarchical context modelling approach based on RNN-LSTM architecture, which models acoustical context on the frame level and partner's emotional context on the dialog level. The method is proved effective together with cross-corpus training setup and domain adaptation technique in a set of speaker independent cross-validation experiments on IEMOCAP corpus for three levels of activation and valence classification. As a result, the state-of-the-art on this corpus is advanced for both dimensions using only acoustic modality.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PublisherIEEE
Pages6700-6704
Number of pages5
Volume2019-May
ISBN (Electronic)9781479981311
DOIs
Publication statusPublished - 1 May 2019
Externally publishedYes
Event44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, United Kingdom
Duration: 12 May 201917 May 2019

Conference

Conference44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Country/TerritoryUnited Kingdom
CityBrighton
Period12/05/1917/05/19

Funding

Acknowledgements. The study is supported by the Russian Science Foundation (project No. 18-11-00145) and Huawei Innovation Research Program.

Keywords

  • context modelling
  • cross-corpus
  • dialog systems
  • Emotion recognition
  • LSTM

Fingerprint

Dive into the research topics of 'Hierarchical Two-level Modelling of Emotional States in Spoken Dialog Systems'. Together they form a unique fingerprint.

Cite this