Transformation and interpolation of language varieties for speech synthesis

Toman, Markus

doi:10.34726/hss.2016.25509

Record link:

https://doi.org/10.34726/hss.2016.25509
http://hdl.handle.net/20.500.12708/6058

Title:

Transformation and interpolation of language varieties for speech synthesis

Citation:

Toman, M. (2016). Transformation and interpolation of language varieties for speech synthesis [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2016.25509

reposiTUm DOI:

10.34726/hss.2016.25509

CatalogPlus:

AC13088333

Publication Type:

Thesis - Dissertation

Language:

English

Authors:

Toman, Markus

Advisor:

Rauber, Andreas

Organisational Unit:

E188 - Institut für Softwaretechnik und Interaktive Systeme

Date (published):

11-Jan-2016

Number of Pages:

124

Keywords:

Speech Processing; Speech Synthesis; Hidden Markov Model; Language Varieties; Dialects; Voice Conversion

Abstract:

This thesis aims to advance the field of speech synthesis by investigating and developing new concepts for acoustic modeling, transformation and interpolation of language varieties (i.e. dialects, sociolects, foreign accents). The goal is to enable systems with speech output to adapt to individual needs and preferences of their users. Transformation of language varieties aims to convert a voice model from one variety to a model in another variety while retaining the voice characteristics. Between multiple voice models of different varieties, interpolation allows to generate intermediate varieties. Both approaches are used to widen the range of speaking styles available to speech output systems. Further, two specific applications are investigated in this thesis: foreign accent reduction and the generation of intelligible fast speech for visually impaired users. All presented methods are evaluated through listening tests and objective measures where appropriate. To conduct these experiments, phone sets and recording scripts for three Austrian German dialects have been created and speech corpora from selected native dialect speakers have been recorded in studio quality. We present a method for unsupervised dialect interpolation and show that listeners are able to correctly perceive the changes in degree of dialect for different settings of the interpolation parameter. We show that transformation of dialects while retaining the original speaker characteristics is possible with the methods presented here. We also compare different approaches for generation of fast synthetic speech. Our experiments show that linearly compressed, natural speech signals are more intelligible than naturally produced fast speech produced by our professional speakers. Overall, this thesis shows how adaptive modeling can be applied to control and modify the language variety of a speech synthesis system.

Additional information:

Zusammenfassung in deutscher Sprache
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

License:

In Copyright

Appears in Collections:

Thesis