Oscillator-plus-noise modeling of speech signals

Rank, Erhard

DC Field

Value

Language

dc.contributor.advisor

Kubin, Gernot

dc.contributor.author

Rank, Erhard

dc.date.accessioned

2020-06-30T14:47:17Z

dc.date.issued

2005

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Rank, E. (2005). <i>Oscillator-plus-noise modeling of speech signals</i> [Dissertation, Technische Universität Wien]. reposiTUm. https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-16628</div> </div>

dc.identifier.uri

https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-16628

dc.identifier.uri

http://hdl.handle.net/20.500.12708/12781

dc.description.abstract

In this thesis we examine the autonomous oscillator model for synthesis of speech signals. The contributions comprise an analysis of realizations and training methods for the nonlinear function used in the oscillator model, the combination of the oscillator model with inverse filtering, both significantly increasing the number of `successfully' re-synthesized speech signals, and the introduction of a new technique suitable for the re-generation of the noise-like signal component in speech signals.<br />Nonlinear function models are compared in a one-dimensional modeling task regarding their presupposition for adequate re-synthesis of speech signals, in particular considering stability. The considerations also comprise the structure of the nonlinear functions, with the aspect of the possible interpolation between models for different speech sounds. Both regarding stability of the oscillator and the premiss of a nonlinear function structure that may be pre-defined, RBF networks are found a preferable choice. In particular in combination with a Bayesian training algorithm, RBF networks with Gaussian basis functions outperform other nonlinear function models concerning the requirements for the application in the oscillator model.<br />The application of inverse filtering, in particular linear prediction as a model for speech production, in addition to nonlinear oscillator modeling, allows the oscillator to model an estimated speech source signal as evoked by the oscillatory motion of the vocal folds. The combination of linear prediction inverse filtering and the nonlinear oscillator model is shown to provide a significantly higher number of stably re-synthesized vowel signals, and better spectral reconstruction than the oscillator model applied to the full speech signal. However, for wide-band speech signals the reconstruction of the high-frequency band is still unsatisfactory. With a closer analysis it becomes clear that -- while the oscillatory component can now be reproduced satisfactorily -- a model for the noise-like component of speech signals is still missing.<br /> Our remedy is to extend the oscillator model by a nonlinear predictor used to re-generate the amplitude modulated noise-like signal component of stationary mixed excitation speech signals (including vowels and voiced fricatives). The resulting `oscillator-plus-noise' model is able to re-generate vowel signals, as well as voiced fricatives signals with high fidelity in terms of time-domain waveform, signal trajectory in phase space, and spectral characteristics. Moreover, due to the automatic determination of a zero oscillatory component, also unvoiced fricatives are reproduced adequately as the noise-like component only. With one instance of the proposed model all kinds of stationary speech sounds can be re-synthesized, by applying model parameters -- i.\,e., the RBF network weights and linear prediction filter coefficients -- learned from a natural speech signal for each sound.<br />In a first objective analysis of naturalness of the oscillator-plus-noise model generated signals measures for short-term variations in fundamental frequency and amplitude are found to better resemble the measures of the original signal than for the oscillator model only, suggesting an improvement in naturalness.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Sprachsignal

dc.subject

Sprachsynthese

dc.subject

Oszillator

dc.subject

Modell

dc.subject

Bayes-Lernen

dc.title

Oscillator-plus-noise modeling of speech signals

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Erhard Rank

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Mecklenbräuker, Wolfgang

tuw.publication.orgunit

E389 - Institut für Nachrichtentechnik und Hochfrequenztechnik

dc.type.qualificationlevel

Doctoral

dc.identifier.libraryid

AC04899502

dc.description.numberOfPages

155

dc.identifier.urn

urn:nbn:at:at-ubtuw:1-16628

dc.thesistype

Dissertation

dc.thesistype

Dissertation

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

item.languageiso639-1

item.openairetype

doctoral thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_db06

item.openaccessfulltext

Open Access

crisitem.author.dept

E389 - Institute of Telecommunications

crisitem.author.parentorg

E350 - Fakultät für Elektrotechnik und Informationstechnik

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(5.09 MB)

In Copyright

Show simple item record

Page view(s)

240

checked on Dec 1, 2023

Download(s)

105

checked on Dec 1, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM