Abstract
Music as a form of art is intentionally composed to be emotionally expressive. The emotional features of music are invaluable for music indexing and recommendation. In this paper we present a cross-comparison of automatic emotional analysis of music. We created a public dataset of Creative Commons licensed songs. Using valence and arousal model, the songs were annotated both in terms of the emotions that were expressed by the whole excerpt and dynamically with 1 Hz temporal resolution. Each song received 10 annotations on Amazon Mechanical Turk and the annotations were averaged to form a ground truth. Four different systems from three teams and the organizers were employed to tackle this problem in an open challenge. We compare their performances and discuss the best practices. While the effect of a larger feature set was not very apparent in the static emotion estimation, the combination of a comprehensive feature set and a recurrent neural network that models temporal dependencies has largely outperformed the other proposed methods for dynamic music emotion estimation.
| Original language | English |
|---|---|
| Title of host publication | MM '14 Proceedings of the ACM International Conference on Multimedia |
| Publisher | Association for Computing Machinery |
| Pages | 1161-1164 |
| ISBN (Print) | 978-1-4503-3063-3 |
| DOIs | |
| Publication status | Published - 2014 |
| Event | ACM Multimedia - , United States Duration: 3 Nov 2014 → 7 Nov 2014 |
Conference
| Conference | ACM Multimedia |
|---|---|
| Country/Territory | United States |
| Period | 3/11/14 → 7/11/14 |
Keywords
- Music
- emotion
- crowdsourcing
- audio features
- music emotion recognition
- performance evaluation