In this thesis we examine the autonomous oscillator model for synthesis of speech signals. The contributions comprise an analysis of realizations and training methods for the nonlinear function used in the oscillator model, the combination of the oscillator model with inverse filtering, both significantly increasing the number of `successfully' re-synthesized speech signals, and the introduction of a new technique suitable for the re-generation of the noise-like signal component in speech signals.<br />Nonlinear function models are compared in a one-dimensional modeling task regarding their presupposition for adequate re-synthesis of speech signals, in particular considering stability. The considerations also comprise the structure of the nonlinear functions, with the aspect of the possible interpolation between models for different speech sounds. Both regarding stability of the oscillator and the premiss of a nonlinear function structure that may be pre-defined, RBF networks are found a preferable choice. In particular in combination with a Bayesian training algorithm, RBF networks with Gaussian basis functions outperform other nonlinear function models concerning the requirements for the application in the oscillator model.<br />The application of inverse filtering, in particular linear prediction as a model for speech production, in addition to nonlinear oscillator modeling, allows the oscillator to model an estimated speech source signal as evoked by the oscillatory motion of the vocal folds. The combination of linear prediction inverse filtering and the nonlinear oscillator model is shown to provide a significantly higher number of stably re-synthesized vowel signals, and better spectral reconstruction than the oscillator model applied to the full speech signal. However, for wide-band speech signals the reconstruction of the high-frequency band is still unsatisfactory. With a closer analysis it becomes clear that -- while the oscillatory component can now be reproduced satisfactorily -- a model for the noise-like component of speech signals is still missing.<br /> Our remedy is to extend the oscillator model by a nonlinear predictor used to re-generate the amplitude modulated noise-like signal component of stationary mixed excitation speech signals (including vowels and voiced fricatives). The resulting `oscillator-plus-noise' model is able to re-generate vowel signals, as well as voiced fricatives signals with high fidelity in terms of time-domain waveform, signal trajectory in phase space, and spectral characteristics. Moreover, due to the automatic determination of a zero oscillatory component, also unvoiced fricatives are reproduced adequately as the noise-like component only. With one instance of the proposed model all kinds of stationary speech sounds can be re-synthesized, by applying model parameters -- i.\,e., the RBF network weights and linear prediction filter coefficients -- learned from a natural speech signal for each sound.<br />In a first objective analysis of naturalness of the oscillator-plus-noise model generated signals measures for short-term variations in fundamental frequency and amplitude are found to better resemble the measures of the original signal than for the oscillator model only, suggesting an improvement in naturalness.
de
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Sprachsignal
de
dc.subject
Sprachsynthese
de
dc.subject
Oszillator
de
dc.subject
Modell
de
dc.subject
Bayes-Lernen
de
dc.title
Oscillator-plus-noise modeling of speech signals
en
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Erhard Rank
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
dc.contributor.assistant
Mecklenbräuker, Wolfgang
-
tuw.publication.orgunit
E389 - Institut für Nachrichtentechnik und Hochfrequenztechnik
-
dc.type.qualificationlevel
Doctoral
-
dc.identifier.libraryid
AC04899502
-
dc.description.numberOfPages
155
-
dc.identifier.urn
urn:nbn:at:at-ubtuw:1-16628
-
dc.thesistype
Dissertation
de
dc.thesistype
Dissertation
en
dc.rights.identifier
In Copyright
en
dc.rights.identifier
Urheberrechtsschutz
de
item.languageiso639-1
en
-
item.openairetype
doctoral thesis
-
item.grantfulltext
open
-
item.fulltext
with Fulltext
-
item.cerifentitytype
Publications
-
item.mimetype
application/pdf
-
item.openairecristype
http://purl.org/coar/resource_type/c_db06
-
item.openaccessfulltext
Open Access
-
crisitem.author.dept
E389 - Institute of Telecommunications
-
crisitem.author.parentorg
E350 - Fakultät für Elektrotechnik und Informationstechnik