Title: Multi-modal music information retrieval: augmenting audio-analysis with visual computing for improved music Video analysis
Language: English
Authors: Schindler, Alexander 
Qualification level: Doctoral
Advisor: Rauber, Andreas 
Issue Date: 2019
Number of Pages: 169
Qualification level: Doctoral
This thesis focuses on harnessing the information provided by the visual layer of music videos for augmenting and improving tasks of the research domain Music Information Retrieval (MIR). The main hypothesis of this work is based on the observation that certain expressive categories, such as genre or theme, can be recognized solely based on the visual content, without the sound being heard. This leads to the hypothesis that there exists a visual language that is used to express mood or genre. In a further consequence it can be concluded that this visual information is music related and therefore should be beneficial for the corresponding MIR tasks such as music genre classification or mood recognition. The validation of these hypotheses is first based on literature search in the Musicology and Music Psychology research domain to identify production processes in music videos or visual branding in the music business. The analytical approach is based on a series of comprehensive experiments and evaluations of visual features concerning their ability to describe music related information. These evaluations range from low-level visual features to high-level concepts. Additionally, new visual features are introduced capturing rhythmic visual patterns. Experimental results showed that the developed audio-visual approaches improved over the audio-based benchmark in the conducted experiments for the three prominent MIR tasks Artist Identification, Music Genre and Cross-Genre Classification. Finally, the experimental results were compared to the findings from the literature review, which revealed correlations between identified production processes and quantitatively determined audio-visual correlations. Thus, well-known and documented visual stereotypes (e.g., cowboy hat/Country music, swimsuit/Dance, fire/Heavy Metal), the choice of particular colours as well as theme-specific symbols, could be confirmed.
Keywords: Music Information Retrieval; Multi-Modal Information Retrieval; Audio-Visual Analysis; Machine Learning
URI: https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-130913
Library ID: AC15508631
Organisation: E194 - Institut für Information Systems Engineering 
Publication Type: Thesis
Appears in Collections:Thesis

Files in this item:

Page view(s)

checked on Jun 15, 2021


checked on Jun 15, 2021

Google ScholarTM


Items in reposiTUm are protected by copyright, with all rights reserved, unless otherwise indicated.