Large-scale bird song identification using convolutional neural networks

Fazekas, Botond

doi:10.34726/hss.2018.55981

Datensatz Zitierlink:

https://doi.org/10.34726/hss.2018.55981
http://hdl.handle.net/20.500.12708/1821

Titel:

Large-scale bird song identification using convolutional neural networks

Zitat:

Fazekas, B. (2018). Large-scale bird song identification using convolutional neural networks [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.55981

reposiTUm-DOI:

10.34726/hss.2018.55981

CatalogPlus:

AC15056010

Publikationstyp:

Hochschulschrift - Diplomarbeit

Sprache:

Englisch

Autor_innen:

Fazekas, Botond

Betreuer_in:

Lidy, Thomas

Mitbetreuer_innen:

Rauber, Andreas
Schindler, Alexander

Organisationseinheit:

E194 - Institut für Information Systems Engineering

Datum (veröffentlicht):

2018

Umfang:

Keywords:

Large-scale,

Abstract:

Um Ökosysteme begreifen zu können, ist es wichtig, Wildtiere zu verstehen; deren Beobachtung kann jedoch kompliziert und zeitintensiv sein. Eine automatisierte Aufnahme von Tönen und Lauten der Umwelt selbst ist ein einfaches Unterfangen, für eine nachträgliche Identifizierung der Wildtiere ist allerdings ein komplexeres System vonnöten. Eine manuelle Identifizierung wäre zu umständlich, wodurch eine automatisierte Methode im Forschungsbereich eine vielversprechende Alternative wäre. Vögel eigenen sich dabei besonders gut für diese Aufgabe, da ihre Kommunikation zum Großteil durch das Singen passiert und sie darüber hinaus aufgrund ihrer schnellen Reaktion auf Änderungen in ihrer Umgebung gute ökologische Indikatoren darstellen. Die aktuell verfügbaren Datensets beinhalten sehr viele Vogelgesänge verschiedenster Arten in diversen Umgebungen und so ist neben der Klassifizierungs-Genauigkeit auch die Skalierbarkeit ein wichtiger Faktor. Das Ziel dieser Studie ist das Verbessern von den akustischen state-of-the-art Vögel-identifzierungs-Methoden, die in dem BirdCLEF2016 Wettbewerb evaluiert wurden, sowohl im Sinne der Genauigkeit als auch der benötigten Trainingszeit, und damit im Sinne der Skalierbarkeit. Diese Arbeit beschreibt die Vorbereitungs-Schritte, die für die Unterscheidung von Vogelgesang von Hintergrundsgeräuschen verwendet wurde, sie evaluiert die Leistung von den vorgeschlagenen Convolutional Neural Network (CNN) Modellen mit Rectified Linear Units und Exponential Linear Units (ELU) mit dem BirdCLEF2017 Datenset, sowie den Effekt von Mel-Skalierung und Constant-Q Transformation der Töne. Außerdem wird in dieser Arbeit eine neue, multi-modale Architektur präsentiert, die die verschiedenen vorhandenen Metadaten für die Feld-Aufnahmen für die Klassifizierung verwendet. Die Ergebnisse zeigen, dass einfachere CNN Modelle mit ELUs im Bezug auf Trainingszeit und Klassifizierungs-Leistung besser abschneiden als jene state-of-the-art Lösungen, wohingegen die Verwendung von Metadaten einen deutlich positiveren Effekt auf die Identifizierungs-Genauigkeit hat.

Understanding wildlife population is important for understanding ecosystems, monitoring it, however, is difficult and time-consuming. Automatically capturing environmental sounds is easy, but it requires subsequent identification of the wildlife. Doing this manually in cumbersome, thus automated methods are a promising field of research. Birds are especially well fitted for this task as their main way of communication is by singing and they are an important ecological indicator since they are responding quickly to changes in their environment. The current datasets available contain a large number of bird songs of different species in various environments, hence besides the classification accuracy the scalability of the methods is an important factor, too. The aim of this study is to improve upon the state-of-the-art acoustical bird identification methods evaluated in the BirdCLEF2016 competition in terms of both the identification accuracy as well as the required training time and therefore the scalability. The work describes the pre-processing steps used to separate the bird songs from the background noise, it evaluates the performance proposed simpler convolutional neural network (CNN) models with Rectified Linear Units and Exponential Linear Units on the BirdCLEF2017 dataset, along with the effect of Mel-scaling and Constant-Q transforming of the sounds. Furthermore a novel multi-modal architecture is proposed which, incorporates the various metadata available for the field recordings. The results show that the simpler CNN model with exponential linear units largely improves on the training time and classification performance compared to the state-of-the-art solutions, while using metadata significantly has a major positive effect on the identification accuracy.

Weitere Information:

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

Lizenz:

Urheberrechtsschutz

Enthalten in den Sammlungen:

Thesis