Sound event detection with deep neural networks

Naghibzadeh-Jalali, Seyedeh-Anahid

doi:10.34726/hss.2018.42625

DC Field

Value

Language

dc.contributor.advisor

Rauber, Andreas

dc.contributor.author

Naghibzadeh-Jalali, Seyedeh-Anahid

dc.date.accessioned

2020-06-29T08:34:11Z

dc.date.issued

2018

dc.date.submitted

2018-10

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Naghibzadeh-Jalali, S.-A. (2018). <i>Sound event detection with deep neural networks</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.42625</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2018.42625

dc.identifier.uri

http://hdl.handle.net/20.500.12708/5458

dc.description

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

dc.description.abstract

Acoustic Sound Event Detection (SED) has been extensively studies over the past years and is considered an emerging topic in Computational Auditory Scene Analysis (CASA) research which relates to the cocktail party eect. SED systems try to implement the phenomenon ability of the human brain, which enables human to detect any events occurring in the environmental sound in its surrounding. Therefore, these systems are trained in such a way that they classify sound events in the input audio signals. A Sound event is a label used by humans to describe and identify an event in an audio sequence. The proposed methodology used for this thesis is the Articial Neural Networks (ANNs) which have already shown robust performance on complicated tasks such as Speech Recognition, Natural Language Processing and Image Classication. Dierent audio input representations such as Constant Q-transform, Mel Frequency Cepstral Coecient (MFCC) and Mel Spectrogram are also tested from which Mel-Spectrogram proved to be the better representation among the ones mentioned. The ANN architectures studied in this work are the Recurrent Neural Network (RNN) and its extension, Long Short Term Memory (LSTM) and the Convolutional Neural Network (CNN). RNN architecture was chosen because of its ability to capture the temporal behaviour of its inputs and CNN architecturebecauseofitsabilitytolearnthehighlevelfeaturesthroughitsconvolutional layers. To generalize the constructed models, data augmentation was performed and also, the dropout technique was applied to avoid over learning. To evaluate the performance of these models, two datasets provided by the DCASE community for their DCASE 2017 challenge were used. The experimental results of this thesis show the robustness of deep neural networks in comparison with the conventional Multilayer Perceptron, ans Support vector machines which are considered as the baseline systems.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

deep learning

dc.subject

deep neural networks

dc.subject

audio event detection

dc.subject

sound event detection

dc.subject

acoustic event detection

dc.subject

event detection

dc.title

Sound event detection with deep neural networks

dc.title.alternative

Akustische Szenenalanyse mit Deep Neural Networks

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2018.42625

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Seyedeh-Anahid Naghibzadeh-Jalali

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

tuw.publication.orgunit

E188 - Institut für Softwaretechnik und Interaktive Systeme

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC15175533

dc.description.numberOfPages

dc.identifier.urn

urn:nbn:at:at-ubtuw:1-114963

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.advisor.orcid

0000-0002-9272-6225

item.languageiso639-1

item.openairetype

master thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.openaccessfulltext

Open Access

crisitem.author.dept

TU Wien

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(8.26 MB)

In Copyright

Show simple item record

Page view(s)

438

checked on Nov 20, 2023

Download(s)

264

checked on Nov 20, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM