Musical instrument separation

Grecu, Andrei

Record link:

http://hdl.handle.net/20.500.12708/184144

Title:

Citation:

Grecu, A. (2007). Musical instrument separation [Master Thesis, Technische Universität Wien]. reposiTUm. http://hdl.handle.net/20.500.12708/184144

CatalogPlus:

AC05035755

Publication Type:

Thesis - Masterarbeit

Language:

English

Authors:

Grecu, Andrei

Advisor:

Rauber, Andreas

Co-advisor:

Lidy, Thomas

Organisational Unit:

E188 - Institut für Softwaretechnik und Interaktive Systeme

Date (published):

2007

Number of Pages:

138

Keywords:

Musik; Audio; Instrumententrennung; Template-Matching; Blind Source Separation; Onset-Detection; Binaural

Music; Audio; Instrument Separation; Template-Matching; Blind Source Separation; Onset-Detection; Binaural

Abstract:

Das menschliche Gehirn kann das Problem Instrumente innerhalb eines Musikstückes zu trennen relativ leicht lösen. Für Computer jedoch ist das noch immer ein schwieriges Problem zu dem noch keine zufrieden stellende Lösung gefunden wurde. Unser Ziel ist es deshalb Möglichkeiten zu finden, Musikstücke in Formaten wie z.B. mp3 zu analysieren, die Instrumente mittels verschiedenen Stereomerkmalen und gewissen Annahmen über die Struktur von Musik zu separieren und schließlich die Resultate in mehreren Tonspuren zu speichern. Unser Beitrag besteht aus drei Algorithmen. Der schablonenbasierte Algorithmus nimmt an, dass die Töne der Instrumente jeweils in ihrer Anzahl über das Musikstück limitiert sind und deshalb wiederholt werden müssen um eine gewisse Klangvielfalt zu erreichen. Diese Redundanz kann ausgenutzt werden um Töne mittels Schablonen zu modellieren. Es wird dabei versucht das Musikstück mit so wenigen Schablonen und Anschlägen wie möglich zu rekonstruieren. Schließlich müssen die Schablonen zu Instrumenten zusammengefasst werden. Als eine Verbesserung dient der zweite Ansatz, wobei wir annehmen dass der Anschlagsvektor nicht unbedingt unter Zuhilfenahme von Relevanzheuristiken gefunden werden muss, sondern dass die Möglichkeit besteht ihn sich iterativ selbst organisieren zu lassen. Der dritte Ansatz gehört zum Gebiet des Blind Source Separation, wobei er auf Stereomerkmalen im Frequenzspektrum arbeitet wodurch ein leicht durch Histogramme visualisierbarer Merkmalraum entsteht. Unter der Annahme dass Instrumente sich während der Aufführung nicht bewegen, sollte der Merkmalraum Häufungen aufweisen. Durch deren automatische Identifizierung können darunter fallende Frequenzen separiert werden, wodurch alles getrennt werden kann was an der entsprechenden räumlichen Position, liegt. Dieser Ansatz ist jedoch nicht neu in der Literatur. Unsere Verbesserung hierbei ist Frequenzen durch Farben darzustellen, im Gegensatz zu den bisherigen s/w Histogrammen. Zusätzlich clustern wir das Histogramm automatisch durch einen Netzwerk mit radialen Basisfunktionen (RBFN). Die Evaluierungsergebnisse von zwei von diesen Algorithmen, unter Zuhilfenahme verschiedener Korpora, zeigen erfreuliche Ergebnisse. Ihre Trennschärfe ist in etwa vier Mal höher im Vergleich zur Baseline. Daraus resultierte die Idee für zukünftige Entwicklungen die Konzepte des ersten und dritten Ansatzes in einen neuen Algorithmus zu vereinen.

The problem of separating instruments in a musical piece can be easily solved by the human brain. For computers on the other hand this task is still difficult and no general solution exists at the time of this thesis. Our goal is therefore to find some solutions using the limited power of today's computers at its best to analyze a musical performance given in some common format like mp3, separate the instruments using two different stereo cues together with some assumptions about the structure of music and finally save the result into several tracks. We approached this goal by contributing three separation algorithms where two of them make use of some different properties of music. The first one being a template matching algorithm assumes that instrument tones are only limited in number throughout a song and therefore have to be repeated in order to create diversity. This kind of redundancy can be used by modeling the tones with templates and trying to reconstruct the musical piece with as few templates and onsets as possible, which in turn should lead to a solution where each template matches a tone. The second algorithm is an improvement over the first where we assume that the onset vector does not need to be found using relevance heuristics, but can be let to self-organize iteratively which is why we called it the iterative template matching algorithm (ITM). The third approach is a blind source separation algorithm using stereo cues in the frequency domain which form an easily visualizable feature space. Assuming that instruments do not move during the performance the resulting histogram visualization will show clusterings. By identifying these clusters we can separate the frequencies falling into them thus separating whatever is at the spatial location corresponding to the cluster region in the histogram. This approach is not new in literature, so our improvements are to use the frequency to generate colours thus adding information to the b/w histograms used before, and to cluster the histogram automatically using a radial basis function network (RBFN). The evaluation results for two of these algorithms using different corpora look very promising. Their separation performance is about four times higher than a simple baseline used for comparison. As a consequence we then present as an issue for future work, the idea of unifying the concepts of our first and third approach to create a new algorithm.

Additional information:

Zsfassung in dt. Sprache

Appears in Collections:

Thesis

Show full item record

Page view(s)

153

checked on Nov 23, 2023

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM