Abstract
For mirex 2013, the evaluation of audio chord estimation
(ace) followed a new scheme. Using chord vocabularies
of differing complexity as well as segmentation measures,
the new scheme provides more information than the ace
evaluations from previous years. With this new information,
however, comes new interpretive challenges. What
are the correlations among different songs and, more importantly,
different submissions across the new measures?
Performance falls off for all submissions as the vocabularies
increase in complexity, but does it do so directly in proportion
to the number of more complex chords, or are certain
algorithms indeed more robust? What are the outliers, songalgorithm
pairs where the performance was substantially
higher or lower than would be predicted, and how can they
be explained? Answering these questions requires moving
beyond the Friedman tests that have most often been
used to compare algorithms to a richer underlying model.
We propose a logistic-regression approach for generating
comparative statistics for mirex ace, supported with generalised
estimating equations (gees) to correct for repeated
measures. We use the mirex 2013 ace results as a case
study to illustrate our proposed method, including some of
interesting aspects of the evaluation that might not apparent
from the headline results alone.
(ace) followed a new scheme. Using chord vocabularies
of differing complexity as well as segmentation measures,
the new scheme provides more information than the ace
evaluations from previous years. With this new information,
however, comes new interpretive challenges. What
are the correlations among different songs and, more importantly,
different submissions across the new measures?
Performance falls off for all submissions as the vocabularies
increase in complexity, but does it do so directly in proportion
to the number of more complex chords, or are certain
algorithms indeed more robust? What are the outliers, songalgorithm
pairs where the performance was substantially
higher or lower than would be predicted, and how can they
be explained? Answering these questions requires moving
beyond the Friedman tests that have most often been
used to compare algorithms to a richer underlying model.
We propose a logistic-regression approach for generating
comparative statistics for mirex ace, supported with generalised
estimating equations (gees) to correct for repeated
measures. We use the mirex 2013 ace results as a case
study to illustrate our proposed method, including some of
interesting aspects of the evaluation that might not apparent
from the headline results alone.
Original language | English |
---|---|
Title of host publication | Proceedings of the 15th Conference of the International Society for Music Information Retrieval (ISMIR 2014) |
Subtitle of host publication | October 27 - 31, 2014 Taipei, Taiwan |
Editors | Hsin-Min Wang , Yi-Hsuan Yang , Jin Ha Lee |
Pages | 525-530 |
Number of pages | 6 |
Publication status | Published - 2014 |
Event | International Society for Music Information Retrieval Conference - Taipei, Taiwan, Province of China Duration: 27 Oct 2014 → 31 Oct 2014 |
Conference
Conference | International Society for Music Information Retrieval Conference |
---|---|
Country/Territory | Taiwan, Province of China |
City | Taipei |
Period | 27/10/14 → 31/10/14 |