Applications of neural attention for modelling long-range dependencies

Wödlinger, Matthias Gerold

doi:10.34726/hss.2025.128663

DC Field

Value

Language

dc.contributor.advisor

Sablatnig, Robert

dc.contributor.author

Wödlinger, Matthias Gerold

dc.date.accessioned

2025-01-21T13:01:59Z

dc.date.issued

2024

dc.date.submitted

2025-01

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Wödlinger, M. G. (2024). <i>Applications of neural attention for modelling long-range dependencies</i> [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.128663</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2025.128663

dc.identifier.uri

http://hdl.handle.net/20.500.12708/209197

dc.description.abstract

Dominant neural network architectures such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are inherently local and require deep networks to incorporate distant information. Early work on visual attention al- lowed the modelling of non-local structures through sequences of local glimpses, and more recently, the Transformer family of networks revolutionised neural attention by using self-attention to process all inputs in parallel. This thesis explores applications of neural attention mechanisms in computer vision, focusing on modelling long-range dependencies in three domains: historical document analysis, biomedical data processing, and stereo-image compression. Moreover, it introduces new methods that use attention mechanisms to address specific challenges in these domains. A visual attention-based method for extracting text baselines from images of historical handwritten texts is presented. The proposed method relies on a network that sequentially shifts attention along text lines to determine polygon coordinates. The method allows for direct learned prediction of text baseline coordinates rather than relying on heuristics. In the biomedical domain, the thesis introduces the Flowformer model, an efficient variant of the Transformer architecture designed to process large flow cytometry samples for cancer cell detection. The proposed approach uses global attention to model samples holistically and achieves state-of- the-art performance in cancer cell identification across multiple datasets. Finally, two stereo image compression models, SASIC and ECSIC, are presented, which use cross-attention mechanisms to model the mutual information between stereo image pairs. These methods achieve state-of-the-art compression performance by efficiently capturing redundancies between images while maintaining fast runtimes. This thesis provides explicit solutions to real-world problems in document analysis, biomedical data processing, and image compression. It also demonstrates the effectiveness of neural attention in dealing with different long-range dependencies.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Deep Learning

dc.subject

Computer Vision

dc.subject

Machine Learning

dc.subject

Transformer

dc.subject

Neural Attention

dc.subject

Image Compression

dc.subject

Bioinformatics

dc.subject

Document Analysis

dc.title

Applications of neural attention for modelling long-range dependencies

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2025.128663

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Matthias Gerold Wödlinger

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

tuw.publication.orgunit

E193 - Institut für Visual Computing and Human-Centered Technology

dc.type.qualificationlevel

Doctoral

dc.identifier.libraryid

AC17413540

dc.description.numberOfPages

114

dc.thesistype

Dissertation

dc.thesistype

Dissertation

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.advisor.orcid

0000-0003-4195-1593

item.languageiso639-1

item.openairetype

doctoral thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.openairecristype

http://purl.org/coar/resource_type/c_db06

item.openaccessfulltext

Open Access

crisitem.author.dept

E193-01 - Forschungsbereich Computer Vision

crisitem.author.parentorg

E193 - Institut für Visual Computing and Human-Centered Technology

Appears in Collections:

Thesis

Woedlinger Matthias Gerold - 2025 - Applications of Neural Attention for...pdf

Adobe PDF

(6.26 MB)

Show simple item record

Page view(s)

315

checked on Jan 21, 2025

Download(s)

134

checked on Jan 21, 2025

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM