<div class="csl-bib-body">
<div class="csl-entry">Damböck, M., Vogl, R., & Knees, P. (2022). On the Impact and Interplay of Input Representations and Network Architectures for Automatic Music Tagging. In P. Rao, H. Murphy, A. Srinivasamurthy, R. Bittner, R. Caro Repetto, M. Goto, X. Serra, & M. Miron (Eds.), <i>Proceedings of the 23rd International Society for Music Information Retrieval Conference. ISMIR 2022</i> (pp. 941–948). International Society for Music Information Retrieval. https://doi.org/10.5281/zenodo.7343091</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/177655
-
dc.description.abstract
Automatic music tagging systems have once more gained relevance over the last years, not least through their use in applications such as music recommender systems. State-of-the-art systems are based on a variant of convolutional neural networks (CNNs) and use some type of time-frequency audio representation as input, in a fitting combination to predict semantic tags available through expert or crowd-based annotation. In this work we systematically compare five widely used audio input representations (STFT, CQT, Mel spectrograms, MFCCs, and raw audio waveform) using five established convolutional neural network architectures (MusicCNN, VGG16, ResNet, a Squeeze and Excitation Network (SeNet), as well as a newly proposed MusicCNN variant using dilated convolutions) for the task of music tag prediction. Performance of all factor combinations are measured on two distinct tagging datasets, namely MagnaTagATune and MTG Jamendo. A two-way ANOVA shows that both input representation and model architecture significantly impact the classification results. Despite differently sized input representations and practical impact on model training, we find that using STFT as input representations provides the best results overall and on specific tag categories (genre, instrument, mood), while other representations show less consistent behavior in these regards. Furthermore, the proposed dilated convolutional architecture shows significant performance improvements for all input representations except raw waveform.
en
dc.description.sponsorship
Fonds zur Förderung der wissenschaftlichen Forschung (FWF)
-
dc.language.iso
en
-
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/
-
dc.subject
Music Information Retrieval
en
dc.subject
Auto-tagging
en
dc.subject
Deep Learning
en
dc.subject
Audio signal representations
en
dc.title
On the Impact and Interplay of Input Representations and Network Architectures for Automatic Music Tagging
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.rights.license
Creative Commons Namensnennung 4.0 International
de
dc.rights.license
Creative Commons Attribution 4.0 International
en
dc.contributor.editoraffiliation
Amazon (United States), United States of America (the)
-
dc.relation.isbn
978-1-7327299-2-6
-
dc.relation.doi
10.5281/zenodo.7676768
-
dc.description.startpage
941
-
dc.description.endpage
948
-
dc.relation.grantno
P 33526-N
-
dc.rights.holder
Maximilian Damböck, Richard Vogl, Peter Knees
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
Proceedings of the 23rd International Society for Music Information Retrieval Conference. ISMIR 2022
-
tuw.peerreviewed
true
-
tuw.relation.publisher
International Society for Music Information Retrieval
-
tuw.project.title
Empfehlungssystem & Nutzer: Hin zu gegenseitigem Verständnis
-
tuw.researchTopic.id
I4a
-
tuw.researchTopic.name
Information Systems Engineering
-
tuw.researchTopic.value
100
-
tuw.publication.orgunit
E194-04 - Forschungsbereich Data Science
-
tuw.publisher.doi
10.5281/zenodo.7343091
-
dc.identifier.libraryid
AC17204552
-
dc.description.numberOfPages
8
-
tuw.author.orcid
0000-0003-3906-1292
-
dc.rights.identifier
CC BY 4.0
de
dc.rights.identifier
CC BY 4.0
en
tuw.editor.orcid
0000-0002-9032-7909
-
tuw.editor.orcid
0000-0003-2251-2202
-
tuw.event.name
23rd International Society for Music Information Retrieval Conference
en
tuw.event.startdate
04-12-2022
-
tuw.event.enddate
08-12-2022
-
tuw.event.online
Hybrid
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Bengaluru
-
tuw.event.country
IN
-
tuw.event.presenter
Vogl, Richard
-
tuw.event.track
Single Track
-
wb.sciencebranch
Informatik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.value
100
-
item.languageiso639-1
en
-
item.openairetype
conference paper
-
item.grantfulltext
open
-
item.fulltext
with Fulltext
-
item.cerifentitytype
Publications
-
item.mimetype
application/pdf
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.openaccessfulltext
Open Access
-
crisitem.author.dept
TU Wien
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.orcid
0000-0003-3906-1292
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering