<div class="csl-bib-body">
<div class="csl-entry">Peer, M., Kleber, F., & Sablatnig, R. (2024). SAGHOG: Self-supervised Autoencoder for Generating HOG Features for Writer Retrieval. In <i>Document Analysis and Recognition - ICDAR 2024</i> (pp. 121–138). https://doi.org/10.1007/978-3-031-70536-6_8</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/203696
-
dc.description.abstract
This paper introduces Saghog, a self-supervised pretraining strategy for writer retrieval using HOG features of the binarized input image. Our preprocessing involves the application of the Segment Anything technique to extract handwriting from various datasets, ending up with about 24k documents, followed by training a vision transformer on reconstructing masked patches of the handwriting. Saghog is then finetuned by appending NetRVLAD as an encoding layer to the pretrained encoder. Evaluation of our approach on three historical datasets, Historical-WI, HisFrag20, and GRK-Papyri, demonstrates the effectiveness of Saghog for writer retrieval. Additionally, we provide ablation studies on our architecture and evaluate un- and supervised finetuning. Notably, on HisFrag20, Saghog outperforms related work with a mAP of 57.2% - a margin of 11.6% to the current state of the art, showcasing its robustness on challenging data, and is competitive on even small datasets, e.g. GRK-Papyri, where we achieve a Top-1 accuracy of 58.0%.
en
dc.language.iso
en
-
dc.relation.ispartofseries
Lecture Notes in Computer Science
-
dc.subject
Document Analysis
en
dc.subject
Masked Autoencoder
en
dc.subject
Self-Supervised Learning
en
dc.subject
Writer Retrieval
en
dc.title
SAGHOG: Self-supervised Autoencoder for Generating HOG Features for Writer Retrieval
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.relation.isbn
978-3-031-70536-6
-
dc.description.startpage
121
-
dc.description.endpage
138
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
Document Analysis and Recognition - ICDAR 2024
-
tuw.container.volume
14805
-
tuw.peerreviewed
true
-
tuw.researchTopic.id
I5
-
tuw.researchTopic.name
Visual Computing and Human-Centered Technology
-
tuw.researchTopic.value
100
-
tuw.publication.orgunit
E193-01 - Forschungsbereich Computer Vision
-
tuw.publisher.doi
10.1007/978-3-031-70536-6_8
-
dc.description.numberOfPages
18
-
tuw.author.orcid
0000-0001-6843-0830
-
tuw.author.orcid
0000-0001-8351-5066
-
tuw.author.orcid
0000-0003-4195-1593
-
tuw.event.name
18th International Conference on Document Analysis and Recognition (ICDAR 2024)
en
tuw.event.startdate
30-08-2024
-
tuw.event.enddate
04-09-2024
-
tuw.event.online
On Site
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Athen
-
tuw.event.country
GR
-
tuw.event.presenter
Peer, Marco
-
wb.sciencebranch
Informatik
-
wb.sciencebranch
Mathematik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.oefos
1010
-
wb.sciencebranch.value
90
-
wb.sciencebranch.value
10
-
item.languageiso639-1
en
-
item.openairetype
conference paper
-
item.grantfulltext
none
-
item.fulltext
no Fulltext
-
item.cerifentitytype
Publications
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
crisitem.author.dept
E193-01 - Forschungsbereich Computer Vision
-
crisitem.author.dept
E193-01 - Forschungsbereich Computer Vision
-
crisitem.author.dept
E193 - Institut für Visual Computing and Human-Centered Technology
-
crisitem.author.orcid
0000-0001-6843-0830
-
crisitem.author.orcid
0000-0001-8351-5066
-
crisitem.author.orcid
0000-0003-4195-1593
-
crisitem.author.parentorg
E193 - Institut für Visual Computing and Human-Centered Technology
-
crisitem.author.parentorg
E193 - Institut für Visual Computing and Human-Centered Technology