Raw audio piano synthesis with structured state space models

Dallinger, Dominik

doi:10.34726/hss.2025.128685

DC Field

Value

Language

dc.contributor.advisor

Jantsch, Axel

dc.contributor.author

Dallinger, Dominik

dc.date.accessioned

2025-05-19T07:17:46Z

dc.date.issued

2025

dc.date.submitted

2025-04

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Dallinger, D. (2025). <i>Raw audio piano synthesis with structured state space models</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.128685</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2025.128685

dc.identifier.uri

http://hdl.handle.net/20.500.12708/215487

dc.description.abstract

This thesis introduces Piano-SSM, a novel Structured State Space Model (SSM) architecture for real-time raw piano audio synthesis. Unlike conventional neural audio synthesis models, Piano-SSM focuses on computational efficiency by utilizing the advantages of SSMs, such as linear computational complexity with the sequence length and constant memory consumption. The proposed model synthesizes audio directly from Musical Instrument Digital Interface (MIDI) input. The network requires no intermediate representations in the form of spectral representations or domain-specific expert knowledge, simplifying training and improving accessibility. Evaluations on the MIDI and Audio Edited for Synchronous TRacks and Organization (MAESTRO) dataset show that Piano-SSM achieves a Multi-Scale Spectral Loss (MSSL) comparable to state-of-the-art models. Moreover, evaluations on the MIDI Aligned Piano Sounds (MAPS) dataset demonstrate the model’s generalization capabilities when trained on a dataset with very limited data. Further experiments on the MAESTRO dataset highlight the model’s ability to be trained on a high sampling rate while synthesizing on lower sampling rates. Finally, utilizing a custom C++ implementation, the thesis demonstrates Piano-SSM’s ability to synthesize high-quality piano audio in real-time.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Deep Neural Networks

dc.subject

Machine Learning

dc.subject

State Space Models

dc.subject

audio classification

dc.title

Raw audio piano synthesis with structured state space models

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2025.128685

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Dominik Dallinger

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Bittner, Matthias

tuw.publication.orgunit

E384 - Institut für Computertechnik

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC17522833

dc.description.numberOfPages

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0003-2251-0004

tuw.assistant.orcid

0009-0004-8022-2232

item.openairetype

master thesis

item.grantfulltext

open

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.openaccessfulltext

Open Access

item.languageiso639-1

crisitem.author.dept

E384-02 - Forschungsbereich Systems on Chip

crisitem.author.parentorg

E384 - Institut für Computertechnik

Appears in Collections:

Thesis

Dallinger Dominik - 2025 - Raw Audio Piano Synthesis with Structured State Space...pdf

Adobe PDF

(8.11 MB)

Show simple item record

Page view(s)

checked on May 19, 2025

Download(s)

checked on May 19, 2025

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM