Raw audio piano synthesis with structured state space models

Dallinger, Dominik

doi:10.34726/hss.2025.128685

Datensatz Zitierlink:

https://doi.org/10.34726/hss.2025.128685
http://hdl.handle.net/20.500.12708/215487

Titel:

Raw audio piano synthesis with structured state space models

Zitat:

Dallinger, D. (2025). Raw audio piano synthesis with structured state space models [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.128685

reposiTUm-DOI:

10.34726/hss.2025.128685

CatalogPlus:

AC17522833

Publikationstyp:

Hochschulschrift - Diplomarbeit

Sprache:

Englisch

Autor_innen:

Dallinger, Dominik

Betreuer_in:

Jantsch, Axel

Mitbetreuer_innen:

Bittner, Matthias

Organisationseinheit:

E384 - Institut für Computertechnik

Datum (veröffentlicht):

2025

Umfang:

Keywords:

Deep Neural Networks; Machine Learning; State Space Models; audio classification

Abstract:

This thesis introduces Piano-SSM, a novel Structured State Space Model (SSM) architecture for real-time raw piano audio synthesis. Unlike conventional neural audio synthesis models, Piano-SSM focuses on computational efficiency by utilizing the advantages of SSMs, such as linear computational complexity with the sequence length and constant memory consumption. The proposed model synthesizes audio directly from Musical Instrument Digital Interface (MIDI) input. The network requires no intermediate representations in the form of spectral representations or domain-specific expert knowledge, simplifying training and improving accessibility. Evaluations on the MIDI and Audio Edited for Synchronous TRacks and Organization (MAESTRO) dataset show that Piano-SSM achieves a Multi-Scale Spectral Loss (MSSL) comparable to state-of-the-art models. Moreover, evaluations on the MIDI Aligned Piano Sounds (MAPS) dataset demonstrate the model’s generalization capabilities when trained on a dataset with very limited data. Further experiments on the MAESTRO dataset highlight the model’s ability to be trained on a high sampling rate while synthesizing on lower sampling rates. Finally, utilizing a custom C++ implementation, the thesis demonstrates Piano-SSM’s ability to synthesize high-quality piano audio in real-time.

Lizenz:

Urheberrechtsschutz

Enthalten in den Sammlungen:

Thesis