SAGHOG: Self-supervised Autoencoder for Generating HOG Features for Writer Retrieval

Peer, Marco; Kleber, Florian; Sablatnig, Robert

doi:10.1007/978-3-031-70536-6_8

DC Field

Value

Language

dc.contributor.author

Peer, Marco

dc.contributor.author

Kleber, Florian

dc.contributor.author

Sablatnig, Robert

dc.date.accessioned

2024-10-28T16:56:07Z

dc.date.available

2024-10-28T16:56:07Z

dc.date.issued

2024

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Peer, M., Kleber, F., & Sablatnig, R. (2024). SAGHOG: Self-supervised Autoencoder for Generating HOG Features for Writer Retrieval. In <i>Document Analysis and Recognition - ICDAR 2024</i> (pp. 121–138). https://doi.org/10.1007/978-3-031-70536-6_8</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/203696

dc.description.abstract

This paper introduces Saghog, a self-supervised pretraining strategy for writer retrieval using HOG features of the binarized input image. Our preprocessing involves the application of the Segment Anything technique to extract handwriting from various datasets, ending up with about 24k documents, followed by training a vision transformer on reconstructing masked patches of the handwriting. Saghog is then finetuned by appending NetRVLAD as an encoding layer to the pretrained encoder. Evaluation of our approach on three historical datasets, Historical-WI, HisFrag20, and GRK-Papyri, demonstrates the effectiveness of Saghog for writer retrieval. Additionally, we provide ablation studies on our architecture and evaluate un- and supervised finetuning. Notably, on HisFrag20, Saghog outperforms related work with a mAP of 57.2% - a margin of 11.6% to the current state of the art, showcasing its robustness on challenging data, and is competitive on even small datasets, e.g. GRK-Papyri, where we achieve a Top-1 accuracy of 58.0%.

dc.language.iso

dc.relation.ispartofseries

Lecture Notes in Computer Science

dc.subject

Document Analysis

dc.subject

Masked Autoencoder

dc.subject

Self-Supervised Learning

dc.subject

Writer Retrieval

dc.title

SAGHOG: Self-supervised Autoencoder for Generating HOG Features for Writer Retrieval

dc.type

Inproceedings

dc.type

Konferenzbeitrag

dc.relation.isbn

978-3-031-70536-6

dc.description.startpage

121

dc.description.endpage

138

dc.type.category

Full-Paper Contribution

tuw.booktitle

Document Analysis and Recognition - ICDAR 2024

tuw.container.volume

14805

tuw.peerreviewed

true

tuw.researchTopic.id

tuw.researchTopic.name

Visual Computing and Human-Centered Technology

tuw.researchTopic.value

100

tuw.publication.orgunit

E193-01 - Forschungsbereich Computer Vision

tuw.publisher.doi

10.1007/978-3-031-70536-6_8

dc.description.numberOfPages

tuw.author.orcid

0000-0001-6843-0830

tuw.author.orcid

0000-0001-8351-5066

tuw.author.orcid

0000-0003-4195-1593

tuw.event.name

18th International Conference on Document Analysis and Recognition (ICDAR 2024)

tuw.event.startdate

30-08-2024

tuw.event.enddate

04-09-2024

tuw.event.online

On Site

tuw.event.type

Event for scientific audience

tuw.event.place

Athen

tuw.event.country

tuw.event.presenter

Peer, Marco

wb.sciencebranch

Informatik

wb.sciencebranch

Mathematik

wb.sciencebranch.oefos

1020

wb.sciencebranch.oefos

1010

wb.sciencebranch.value

item.languageiso639-1

item.openairetype

conference paper

item.grantfulltext

none

item.fulltext

no Fulltext

item.cerifentitytype

Publications

item.openairecristype

http://purl.org/coar/resource_type/c_5794

crisitem.author.dept

E193-01 - Forschungsbereich Computer Vision

crisitem.author.dept

E193-01 - Forschungsbereich Computer Vision

crisitem.author.dept

E193 - Institut für Visual Computing and Human-Centered Technology

crisitem.author.orcid

0000-0001-6843-0830

crisitem.author.orcid

0000-0001-8351-5066

crisitem.author.orcid

0000-0003-4195-1593

crisitem.author.parentorg

E193 - Institut für Visual Computing and Human-Centered Technology

crisitem.author.parentorg

E193 - Institut für Visual Computing and Human-Centered Technology

crisitem.author.parentorg

E180 - Fakultät für Informatik

Appears in Collections:

Conference Paper

Show simple item record

Page view(s)

checked on Oct 29, 2024

Download(s)

checked on Oct 29, 2024

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM