An Emotional Respiration Speech Dataset

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Natural interaction with human-like embodied agents, such as social robots or virtual agents, relies on the generation of realistic non-verbal behaviours, including body language, gaze and facial expressions. Humans can read and interpret somatic social signals, such as blushing or changes in the respiration rate and depth, as part of such non-verbal behaviours. Studies show that realistic breathing changes in an agent improve the communication of emotional cues, but there are scarcely any databases for affect analysis with breathing ground truth to learn how affect and breathing correlate. Emotional speech databases typically contain utterances coloured by emotional intonation, instead of natural conversation, and lack breathing annotations. In this paper, we introduce the Emotional Speech Respiration Dataset, collected from 20 subjects in a spontaneous speech setting where emotions are elicited via music. Four emotion classes (happy, sad, annoying, calm) are elicited, with 20 minutes of data per participant. The breathing ground truth is collected with piezoelectric respiration sensors, and affective labels are collected via self-reported valence and arousal levels. Along with these, we extract and share visual features of the participants (such as facial keypoints, action units, gaze directions), transcriptions of the speech instances, and paralinguistic features. Our analysis shows that the music induced emotions show significant changes in the levels of valence for all four emotions, compared to the baseline. Furthermore, the breathing patterns change with happy music significantly, but the changes in other elicitors are less prominent. We believe this resource can be used with different embodied agents to signal affect via simulated breathing.

Original languageEnglish
Title of host publicationICMI '22 Companion
Subtitle of host publicationCompanion Publication of the 2022 International Conference on Multimodal Interaction
EditorsRaj Tumuluri, Nicu Sebe, Gopal Pingali
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages70-78
Number of pages9
ISBN (Electronic)978-1-4503-9389-8
DOIs
Publication statusPublished - 7 Nov 2022
Event24th ACM International Conference on Multimodal Interaction, ICMI 2022 - Bangalore, India
Duration: 7 Nov 202211 Nov 2022

Publication series

NameACM International Conference Proceeding Series

Conference

Conference24th ACM International Conference on Multimodal Interaction, ICMI 2022
Country/TerritoryIndia
CityBangalore
Period7/11/2211/11/22

Keywords

  • datasets
  • embodied agents
  • emotion elicitation
  • emotions
  • respiration
  • social agents

Fingerprint

Dive into the research topics of 'An Emotional Respiration Speech Dataset'. Together they form a unique fingerprint.

Cite this