Abstract
More and more music is becoming available digitally, increasing the need to navigate through large numbers of audio tracks easily. One approach for improving the browsing experience is music thumbnailing: the procedure of
finding a continuous fragment that can represent the whole
musical piece. This paper proposes a human-centred approach to creating thumbnails based on listeners’ perception, directly asking listeners to identify the most characteristic fragment. We carried out a user study to assign
representativeness scores to multiple fragments from a selection of popular music tracks. To strengthen the results,
we performed a replication of the same user study with
new participants and a different set of music. Thereafter,
we used audio features, a segmentation algorithm, and participants’ overall familiarity with the songs to predict representativeness scores. The results suggest that neither
segmentation nor familiarity have a significant impact on
users’ thumbnail preferences: even segments with starting
points that pay no regard to song structure can be suitable
thumbnails. Three high-level audio characteristics, however, do impact the perceived representativeness of a fragment: Raw Intensity, Melodic Conventionality, and Conventionality of Intensity. Based on these findings, we propose a new, easy-to-apply method for music thumbnailing.
finding a continuous fragment that can represent the whole
musical piece. This paper proposes a human-centred approach to creating thumbnails based on listeners’ perception, directly asking listeners to identify the most characteristic fragment. We carried out a user study to assign
representativeness scores to multiple fragments from a selection of popular music tracks. To strengthen the results,
we performed a replication of the same user study with
new participants and a different set of music. Thereafter,
we used audio features, a segmentation algorithm, and participants’ overall familiarity with the songs to predict representativeness scores. The results suggest that neither
segmentation nor familiarity have a significant impact on
users’ thumbnail preferences: even segments with starting
points that pay no regard to song structure can be suitable
thumbnails. Three high-level audio characteristics, however, do impact the perceived representativeness of a fragment: Raw Intensity, Melodic Conventionality, and Conventionality of Intensity. Based on these findings, we propose a new, easy-to-apply method for music thumbnailing.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 21th International Society for Music Information Retrieval Conference: |
| Place of Publication | Montreal |
| Pages | 223-230 |
| Publication status | Published - 2020 |