Writer Retrieval using Compact Convolutional Transformers and NetMVLAD

Peer, Marco; Kleber, Florian; Sablatnig, Robert

doi:10.1109/ICPR56361.2022.9956155

DC Element

Wert

Sprache

dc.contributor.author

Peer, Marco

dc.contributor.author

Kleber, Florian

dc.contributor.author

Sablatnig, Robert

dc.date.accessioned

2023-02-09T15:06:31Z

dc.date.available

2023-02-09T15:06:31Z

dc.date.issued

2022

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Peer, M., Kleber, F., & Sablatnig, R. (2022). Writer Retrieval using Compact Convolutional Transformers and NetMVLAD. In <i>2022 26th International Conference on Pattern Recognition (ICPR)</i> (pp. 1571–1578). https://doi.org/10.1109/ICPR56361.2022.9956155</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/150348

dc.description.abstract

This paper presents a method for writer retrieval where embeddings of patches extracted at SIFT keypoint locations are learned by a Compact Convolutional Transformer (CCT), a modified attention-based transformer architecture including convolutions, followed by a NetMVLAD layer and Generalized Max Pooling (GMP) to obtain global page descriptors. We introduce the application of CCTs for writer retrieval and show that they outperform Convolutional Neural Networks (CNNs) used in current State-of-the-Art methods for writer retrieval, namely ResNet18, while at the same time only have one-third of the number of parameters. Additionally, we propose Net-MVLAD, an extension of NetVLAD with multiple vocabularies, to encode information with different vocabulary sizes improving the original NetVLAD. An evaluation of the performance of CCTs compared to ResNet18 is provided on the ICDAR2013 Competition on Writer Identification dataset (ICDAR2013) and CVL dataset. The effect of multiple vocabularies applied within the NetVLAD layer is shown. CCT7 pretrained on CIFAR-100 combined with NetMVLAD achieves 89.3% Mean Average Precision (mAP) on the ICDAR2013 dataset and 96.5% on the CVL dataset.

dc.description.sponsorship

FFG - Österr. Forschungsförderungs- gesellschaft mbH

dc.language.iso

dc.subject

Writer Retrieval

dc.subject

Document Analysis

dc.subject

Document Retrieval

dc.subject

NetVLAD

dc.subject

Vision Transformers

dc.title

Writer Retrieval using Compact Convolutional Transformers and NetMVLAD

dc.type

Inproceedings

dc.type

Konferenzbeitrag

dc.relation.isbn

978-1-6654-9062-7

dc.description.startpage

1571

dc.description.endpage

1578

dc.relation.grantno

879687

dc.type.category

Full-Paper Contribution

tuw.booktitle

2022 26th International Conference on Pattern Recognition (ICPR)

tuw.container.volume

2022-August

tuw.peerreviewed

true

tuw.project.title

IT unterstützte Suche und Vergleich von Handschriften

tuw.researchTopic.id

tuw.researchTopic.name

Visual Computing and Human-Centered Technology

tuw.researchTopic.value

100

tuw.linking

https://www.icpr2022.com/submission-guidelines/

tuw.publication.orgunit

E193-01 - Forschungsbereich Computer Vision

tuw.publisher.doi

10.1109/ICPR56361.2022.9956155

dc.description.numberOfPages

tuw.author.orcid

0000-0001-8351-5066

tuw.author.orcid

0000-0003-4195-1593

tuw.event.name

2022 26th International Conference on Pattern Recognition (ICPR)

tuw.event.startdate

21-08-2022

tuw.event.enddate

25-08-2022

tuw.event.online

Hybrid

tuw.event.type

Event for scientific audience

tuw.event.place

Montréal

tuw.event.country

tuw.event.presenter

Peer, Marco

wb.sciencebranch

Informatik

wb.sciencebranch

Mathematik

wb.sciencebranch.oefos

1020

wb.sciencebranch.oefos

1010

wb.sciencebranch.value

item.languageiso639-1

item.openairetype

conference paper

item.grantfulltext

restricted

item.fulltext

no Fulltext

item.cerifentitytype

Publications

item.openairecristype

http://purl.org/coar/resource_type/c_5794

crisitem.author.dept

E193-01 - Forschungsbereich Computer Vision

crisitem.author.dept

E193-01 - Forschungsbereich Computer Vision

crisitem.author.dept

E193 - Institut für Visual Computing and Human-Centered Technology

crisitem.author.orcid

0000-0001-6843-0830

crisitem.author.orcid

0000-0001-8351-5066

crisitem.author.orcid

0000-0003-4195-1593

crisitem.author.parentorg

E193 - Institut für Visual Computing and Human-Centered Technology

crisitem.author.parentorg

E193 - Institut für Visual Computing and Human-Centered Technology

crisitem.author.parentorg

E180 - Fakultät für Informatik

crisitem.project.funder

FFG - Österr. Forschungsförderungs- gesellschaft mbH

crisitem.project.grantno

879687

Enthalten in den Sammlungen:

Conference Paper

Zur Kurzanzeige

Seiten Aufrufe

223

aufgerufen am 23.11.2023

Download(s)

aufgerufen am 23.11.2023

Google Scholar^TM

Check

Seiten Aufrufe

Download(s)

Google ScholarTM

Google Scholar^TM