Kainz, A. (2017). Perceptual modeling: factors influecing speech intelligibility in a multitalker environment and applications in speech separation [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2017.41204
The aim of this thesis is the investigation of speech intelligibility in multitalker environments, where the challenge for the listener is to focus on one speaker in the presence of simultaneous interfering talkers or background noise in order to follow the conversation. In general, this is not a difficult task for normal hearing people, but it can be a challenge for people suffering from hearing impairment. Furthermore, it still remains a problem for machines to deal with interfering speech signals. Within this thesis, different speech segregation algorithms and their mathematical and statistical background are presented. There are different approaches of processing interfering speech signals. Motivated by the powerful ability of the auditory system to analyze and segregate incoming sounds, Computational Auditory Scene Analysis (CASA) aims at replicating the different auditory processing stages. Another essential approach in the context of the separation of interfering speech signals which differs from CASA is Blind Source Separation (BSS) which uses results from Statistics and Information Theory to separate a signal mixture into its sources. In the experimental part of the thesis, a speech intelligibility (SI) test was performed which was implemented in MATLAB® (R2015b). The aim was the investigation of factors affecting Speech Intelligibility where the main focus was on analyzing attributes of the masker signals and their influence on speech perception of the target signal. 12 normal hearing listeners participated in the test and the task was to determine the target signals in the presence of different masker signals. The target signals consisted of 14 nonsense-syllables (e.g. 'affa' or 'assa') from the Oldenburger Logatome Corpus (OLLO) spoken by four female persons. The masker signals included sentences from the Oldenburger Satztest (e.g. 'Britta verleiht elf alte Bilder'), the International Speech Test Signal (ISTS) and Speech Shaped Noise (SSN). The test was evaluated using a two-way repeated measures analysis of variance (ANOVA) in SPSS® Statistics (24) including the two within-subject factors "Signal-to-Noise Ratio" (SNR) and "Masker Type". The results showed a significant main effect in both factors (p<0.001) and in further research, ANOVA also demonstrated a significant influence of the factors "Number of Maskers" (p<0.001) and "Spectral Diversity of the Masker" (p<0.001) on speech intelligibility.