The Influence of Blind Source Separation on Mixed Audio Speech and Music Emotion Recognition

Casper Laugs, Hendrik Vincent Koops, Daan Odijk, Heysem Kaya, Anja Volk

Research output: Contribution to conferencePaperAcademic

Abstract

While both speech emotion recognition and music emotion recognition have been studied extensively in different communities, little research went into the recognition of emotion from mixed audio sources, i.e. when both speech and music are present. However, many application scenarios require models that are able to extract emotions from mixed audio sources, such as television content. This paper studies how mixed audio affects both speech and music emotion recognition using a random forest and deep neural network model, and investigates if blind source separation of the mixed signal beforehand is beneficial. We created a mixed audio dataset, with 25% speech-music overlap without contextual relationship between the two. We show that specialized models for speech-only or music-only audio were able to achieve merely 'chance-level' performance on mixed audio. For speech, above chance-level performance was achieved when trained on raw mixed audio, but optimal performance was achieved with audio blind source separated beforehand. Music emotion recognition models on mixed audio achieve performance approaching or even surpassing performance on music-only audio, with and without blind source separation. Our results are important for estimating emotion from real-world data, where individual speech and music tracks are often not available.
Original languageEnglish
Pages67-71
Number of pages5
DOIs
Publication statusPublished - 25 Oct 2020
EventICMI 2020 Late Breaking Results - Virtual Event, Utrecht, Netherlands
Duration: 25 Oct 202029 Oct 2020
https://icmi.acm.org/2021/index.php?id=cflbr

Workshop

WorkshopICMI 2020 Late Breaking Results
Abbreviated titleICMI20LBR
Country/TerritoryNetherlands
CityUtrecht
Period25/10/2029/10/20
Internet address

Keywords

  • speech emotion recognition
  • music emotion recognition
  • blindsource separation

Fingerprint

Dive into the research topics of 'The Influence of Blind Source Separation on Mixed Audio Speech and Music Emotion Recognition'. Together they form a unique fingerprint.

Cite this