Multi-modal music information retrieval: augmenting audio-analysis with visual computing for improved music Video analysis

Schindler, Alexander

doi:10.34726/hss.2019.72065

Datensatz Zitierlink:

https://doi.org/10.34726/hss.2019.72065
http://hdl.handle.net/20.500.12708/4496

Titel:

Multi-modal music information retrieval: augmenting audio-analysis with visual computing for improved music Video analysis

Zitat:

Schindler, A. (2019). Multi-modal music information retrieval: augmenting audio-analysis with visual computing for improved music Video analysis [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2019.72065

reposiTUm-DOI:

10.34726/hss.2019.72065

CatalogPlus:

AC15508631

Publikationstyp:

Hochschulschrift - Dissertation

Sprache:

Englisch

Autor_innen:

Schindler, Alexander

Betreuer_in:

Rauber, Andreas

Organisationseinheit:

E194 - Institut für Information Systems Engineering

Datum (veröffentlicht):

2019

Umfang:

169

Keywords:

Music Information Retrieval; Multi-Modal Information Retrieval; Audio-Visual Analysis; Machine Learning

Abstract:

This thesis focuses on harnessing the information provided by the visual layer of music videos for augmenting and improving tasks of the research domain Music Information Retrieval (MIR). The main hypothesis of this work is based on the observation that certain expressive categories, such as genre or theme, can be recognized solely based on the visual content, without the sound being heard. This leads to the hypothesis that there exists a visual language that is used to express mood or genre. In a further consequence it can be concluded that this visual information is music related and therefore should be beneficial for the corresponding MIR tasks such as music genre classification or mood recognition. The validation of these hypotheses is first based on literature search in the Musicology and Music Psychology research domain to identify production processes in music videos or visual branding in the music business. The analytical approach is based on a series of comprehensive experiments and evaluations of visual features concerning their ability to describe music related information. These evaluations range from low-level visual features to high-level concepts. Additionally, new visual features are introduced capturing rhythmic visual patterns. Experimental results showed that the developed audio-visual approaches improved over the audio-based benchmark in the conducted experiments for the three prominent MIR tasks Artist Identification, Music Genre and Cross-Genre Classification. Finally, the experimental results were compared to the findings from the literature review, which revealed correlations between identified production processes and quantitatively determined audio-visual correlations. Thus, well-known and documented visual stereotypes (e.g., cowboy hat/Country music, swimsuit/Dance, fire/Heavy Metal), the choice of particular colours as well as theme-specific symbols, could be confirmed.

Lizenz:

Urheberrechtsschutz

Enthalten in den Sammlungen:

Thesis