Abstract
This paper presents a model for predicting expressive
accentuation in piano performances with neural networks.
Using Restricted Boltzmann Machines (RBMs), features
are learned from performance data, after which these
features are used to predict performed loudness. During
feature learning, data describing more than 6000 musical
pieces is used; when training for prediction, two datasets
are used, both recorded on a Bosendorfer piano (accurately
measuring note on- and offset times and velocity values),
but describing different compositions performed by
different pianists. The resulting model is tested by predicting
note velocity for unseen performances. Our approach
differs from earlier work in a number of ways: (1) an
additional input representation based on a local history of
velocity values is used, (2) the RBMs are trained to
result in a network with sparse activations, (3) network
connectivity is increased by adding skip-connections, and (4)
more data is used for training. These modifications result
in a network performing better than the state-of-the-art on
the same data and more descriptive features, which can be
used for rendering performances, or for gaining insight into
which aspects of a musical piece influence its performance.
accentuation in piano performances with neural networks.
Using Restricted Boltzmann Machines (RBMs), features
are learned from performance data, after which these
features are used to predict performed loudness. During
feature learning, data describing more than 6000 musical
pieces is used; when training for prediction, two datasets
are used, both recorded on a Bosendorfer piano (accurately
measuring note on- and offset times and velocity values),
but describing different compositions performed by
different pianists. The resulting model is tested by predicting
note velocity for unseen performances. Our approach
differs from earlier work in a number of ways: (1) an
additional input representation based on a local history of
velocity values is used, (2) the RBMs are trained to
result in a network with sparse activations, (3) network
connectivity is increased by adding skip-connections, and (4)
more data is used for training. These modifications result
in a network performing better than the state-of-the-art on
the same data and more descriptive features, which can be
used for rendering performances, or for gaining insight into
which aspects of a musical piece influence its performance.
Original language | English |
---|---|
Title of host publication | Proceedings of the 15th Conference of the International Society for Music Information Retrieval (ISMIR 2014) |
Subtitle of host publication | October 27 - 31, 2014 Taipei, Taiwan |
Editors | Hsin-Min Wang , Yi-Hsuan Yang , Jin Ha Lee |
Publisher | International Society for Music Information Retrieval |
Pages | 45-52 |
Number of pages | 6 |
Publication status | Published - 2014 |
Event | International Society for Music Information Retrieval Conference - Taipei, Taiwan, Province of China Duration: 27 Oct 2014 → 31 Oct 2014 |
Conference
Conference | International Society for Music Information Retrieval Conference |
---|---|
Country/Territory | Taiwan, Province of China |
City | Taipei |
Period | 27/10/14 → 31/10/14 |