Abstract—The Transformer model has gained popularity because of its capabilities for solving multiple tasks. One of them
is automatic music generation. Many studies have proven that
this model can generate music with a consistent structure, but the
pieces that are generated still lack emotion in it. In this paper, we
extend Transformer-based model capabilities to generate music
with controllable emotion. The emotion is divided into three
categories: negative, neutral, and positive. We train the model
using 120 MIDI files from our new piano datasets. The dataset
has been labeled based on their emotion. The labeling process
is done manually by hearing. The total MIDI files available in
the dataset is 210 but we filter it so that only 120 remains.
We also add a new token to represent emotion on REvamped
MIDI-derived event (REMI). The experimental results show that
human subject agreed that Transformer-XL model using REMI
and emotion token is able to generate emotion-based music. We
also compare our generated pieces with other datasets. The result
show that the majority of respondents prefer pieces that are
generated using ourdatasets.
Keywords—music generation, controllable music generation,
transformer