<div class="csl-bib-body">
<div class="csl-entry">Wödlinger, M. G. (2024). <i>Applications of neural attention for modelling long-range dependencies</i> [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.128663</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2025.128663
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/209197
-
dc.description.abstract
Dominant neural network architectures such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are inherently local and require deep networks to incorporate distant information. Early work on visual attention al- lowed the modelling of non-local structures through sequences of local glimpses, and more recently, the Transformer family of networks revolutionised neural attention by using self-attention to process all inputs in parallel. This thesis explores applications of neural attention mechanisms in computer vision, focusing on modelling long-range dependencies in three domains: historical document analysis, biomedical data processing, and stereo-image compression. Moreover, it introduces new methods that use attention mechanisms to address specific challenges in these domains. A visual attention-based method for extracting text baselines from images of historical handwritten texts is presented. The proposed method relies on a network that sequentially shifts attention along text lines to determine polygon coordinates. The method allows for direct learned prediction of text baseline coordinates rather than relying on heuristics. In the biomedical domain, the thesis introduces the Flowformer model, an efficient variant of the Transformer architecture designed to process large flow cytometry samples for cancer cell detection. The proposed approach uses global attention to model samples holistically and achieves state-of- the-art performance in cancer cell identification across multiple datasets. Finally, two stereo image compression models, SASIC and ECSIC, are presented, which use cross-attention mechanisms to model the mutual information between stereo image pairs. These methods achieve state-of-the-art compression performance by efficiently capturing redundancies between images while maintaining fast runtimes. This thesis provides explicit solutions to real-world problems in document analysis, biomedical data processing, and image compression. It also demonstrates the effectiveness of neural attention in dealing with different long-range dependencies.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Deep Learning
en
dc.subject
Computer Vision
en
dc.subject
Machine Learning
en
dc.subject
Transformer
en
dc.subject
Neural Attention
en
dc.subject
Image Compression
en
dc.subject
Bioinformatics
en
dc.subject
Document Analysis
en
dc.title
Applications of neural attention for modelling long-range dependencies
en
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2025.128663
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Matthias Gerold Wödlinger
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
tuw.publication.orgunit
E193 - Institut für Visual Computing and Human-Centered Technology
-
dc.type.qualificationlevel
Doctoral
-
dc.identifier.libraryid
AC17413540
-
dc.description.numberOfPages
114
-
dc.thesistype
Dissertation
de
dc.thesistype
Dissertation
en
dc.rights.identifier
In Copyright
en
dc.rights.identifier
Urheberrechtsschutz
de
tuw.advisor.staffStatus
staff
-
tuw.advisor.orcid
0000-0003-4195-1593
-
item.languageiso639-1
en
-
item.grantfulltext
open
-
item.cerifentitytype
Publications
-
item.openairetype
doctoral thesis
-
item.openairecristype
http://purl.org/coar/resource_type/c_db06
-
item.fulltext
with Fulltext
-
item.openaccessfulltext
Open Access
-
crisitem.author.dept
E193-01 - Forschungsbereich Computer Vision
-
crisitem.author.parentorg
E193 - Institut für Visual Computing and Human-Centered Technology