A registry and benchmarking tool for lossy neural image compression

Ferrari, Dominik

doi:10.34726/hss.2025.124401

DC Field

Value

Language

dc.contributor.advisor

Dustdar, Schahram

dc.contributor.author

Ferrari, Dominik

dc.date.accessioned

2025-03-18T13:41:05Z

dc.date.issued

2025

dc.date.submitted

2025-02

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Ferrari, D. (2025). <i>A registry and benchmarking tool for lossy neural image compression</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.124401</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2025.124401

dc.identifier.uri

http://hdl.handle.net/20.500.12708/213281

dc.description.abstract

Neuliche Fortschritte in Deep Learning und Hardware-Beschleunigung führten zur Entstehung von neuen Lösungen in KI-basierter Bildkomprimierung. Mit jeder neuen Lösung wurde es schwieriger, die tatsächlichen Auswirkungen jeder Änderung und getroffenen Entscheidung zu messen.Wir schlagen eine Zwei-Faktor-Lösung vor. Wir entwickeln eine Klassifizierung, die die grobe Landschaft der KI-unterstützten Bildkomprimierung darstellt. Sie ist in Kernkomponenten eingeteilt wie Quantifizierung, Kontext-Modelle, architektonische Archetypen, hierarchische Prioren und Modelle mit variabler Rate. Wir analysieren die Trends und Weiterentwicklungen, um Forschern und Forscherinnen den Einstieg und Weiterforschung zu erleichtern.Zusätzlich entwickeln wir ein Benchmark-Werkzeug, um die KI-Modelle zu trainieren und zu testen. Unsere Konfigurationen sind standardisiert und leicht änderbar. Der Test-Prozess misst und vergleicht Variationen der Konfiguration unter der Verwendung von verschiedenen Datensätzen und Metriken. Wir evaluieren jede Konfiguration mit den CLIC- und Kodak-Datensätzen. Wir messen die R-D Leistung, LPIPS, PSNR, MS-SSIM, Kodierungslatenz und den Kodierungsdurchsatz. Wir beobachten die Auswirkung der getroffennen Design-Entscheidungen wie die Änderung der Non-Linearity-Kernkomponente. Während die GSDN-Aktivierungsfunktion in der Non-Linearity-Kernkomponente in einem kleinen Netzwerk eine schlechtere Leistung erzielt hat, war die Leistung die Beste in größeren Netzwerken. Interessanterweise waren Konfigurationen, die größere Netzwerke definiert haben, zwar langsamer, haben aber eine bessere R-D Leistung für den gleichen Kompressionsfaktor erzielt.

dc.description.abstract

Recent advancements in deep learning and hardware accelerators led to the appearance of many novel solutions in learned image compression. With each novel solution, measuring the impact of fine-grained decisions gets increasingly more challenging.We propose a two-fold solution. We develop a taxonomy to define the landscape of lossy learned image compression broadly. We classify advancements into core components such as quantization, context models, architectural archetypes, hierarchical priors, and variable rate models. We analyze the trends and advancements to aid novices and seasoned researchers in showing the focus points of current research.In addition, we develop a benchmarking tool to train and test models. Our standardized configurations are highly customizable by defining variations. The testing pipeline compares variations with multiple datasets and metrics. We evaluate each configuration on the CLIC and Kodak datasets. We measure the R-D performance, LPIPS, PSNR, MS-SSIM, encoding and decoding latency, and encoding and decoding throughput. We observe the impact of fine-grained design decisions, such as changing the non-linearity block. While the GSDN activation function within the non-linearity block performs worse in small networks, its best performance is achieved in deeper networks. Interestingly, configurations with deeper networks performed slower yet achieved higher visual quality for the same compression rates.We compare the learned image compression models with a fixed codec (BPG) to ensure comparable results. We find that the impact of minor design decisions depends on network size, resulting in vastly different performances.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

benchmarking

dc.subject

lossy compression

dc.subject

image compression

dc.subject

neural compression

dc.subject

artificial intelligence

dc.subject

software engineering

dc.subject

survey

dc.title

A registry and benchmarking tool for lossy neural image compression

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2025.124401

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Dominik Ferrari

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Furutanpey, Alireza

tuw.publication.orgunit

E194 - Institut für Information Systems Engineering

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC17468079

dc.description.numberOfPages

105

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0001-6872-8821

tuw.assistant.orcid

0000-0001-5621-7899

item.cerifentitytype

Publications

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.openaccessfulltext

Open Access

item.grantfulltext

open

item.openairetype

master thesis

item.fulltext

with Fulltext

item.languageiso639-1

item.mimetype

application/pdf

crisitem.author.dept

E194 - Institut für Information Systems Engineering

crisitem.author.parentorg

E180 - Fakultät für Informatik

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(2.76 MB)

In Copyright

Show simple item record

Page view(s)

144

checked on Mar 18, 2025

Download(s)

checked on Mar 18, 2025

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM