Title: Perceptual modeling: factors influecing speech intelligibility in a multitalker environment and applications in speech separation
Language: English
Authors: Kainz, Andrea 
Qualification level: Diploma
Advisor: Rattay, Frank 
Issue Date: 2017
Number of Pages: 72
Qualification level: Diploma
Abstract: 
The aim of this thesis is the investigation of speech intelligibility in multitalker environments, where the challenge for the listener is to focus on one speaker in the presence of simultaneous interfering talkers or background noise in order to follow the conversation. In general, this is not a difficult task for normal hearing people, but it can be a challenge for people suffering from hearing impairment. Furthermore, it still remains a problem for machines to deal with interfering speech signals. Within this thesis, different speech segregation algorithms and their mathematical and statistical background are presented. There are different approaches of processing interfering speech signals. Motivated by the powerful ability of the auditory system to analyze and segregate incoming sounds, Computational Auditory Scene Analysis (CASA) aims at replicating the different auditory processing stages. Another essential approach in the context of the separation of interfering speech signals which differs from CASA is Blind Source Separation (BSS) which uses results from Statistics and Information Theory to separate a signal mixture into its sources. In the experimental part of the thesis, a speech intelligibility (SI) test was performed which was implemented in MATLAB® (R2015b). The aim was the investigation of factors affecting Speech Intelligibility where the main focus was on analyzing attributes of the masker signals and their influence on speech perception of the target signal. 12 normal hearing listeners participated in the test and the task was to determine the target signals in the presence of different masker signals. The target signals consisted of 14 nonsense-syllables (e.g. 'affa' or 'assa') from the Oldenburger Logatome Corpus (OLLO) spoken by four female persons. The masker signals included sentences from the Oldenburger Satztest (e.g. 'Britta verleiht elf alte Bilder'), the International Speech Test Signal (ISTS) and Speech Shaped Noise (SSN). The test was evaluated using a two-way repeated measures analysis of variance (ANOVA) in SPSS® Statistics (24) including the two within-subject factors "Signal-to-Noise Ratio" (SNR) and "Masker Type". The results showed a significant main effect in both factors (p<0.001) and in further research, ANOVA also demonstrated a significant influence of the factors "Number of Maskers" (p<0.001) and "Spectral Diversity of the Masker" (p<0.001) on speech intelligibility.
Keywords: speech intelligibility; Computational Auditory Scene Analysis; Oldenburger Logatome Corpus; Speech Shaped Noise
URI: https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-99525
http://hdl.handle.net/20.500.12708/3806
Library ID: AC13727229
Organisation: E101 - Institut für Analysis und Scientific Computing 
Publication Type: Thesis
Hochschulschrift
Appears in Collections:Thesis

Files in this item:


Page view(s)

11
checked on May 6, 2021

Download(s)

63
checked on May 6, 2021

Google ScholarTM

Check


Items in reposiTUm are protected by copyright, with all rights reserved, unless otherwise indicated.