<div class="csl-bib-body">
<div class="csl-entry">Peer, M., Kleber, F., & Sablatnig, R. (2022). Writer Retrieval using Compact Convolutional Transformers and NetMVLAD. In <i>2022 26th International Conference on Pattern Recognition (ICPR)</i> (pp. 1571–1578). https://doi.org/10.1109/ICPR56361.2022.9956155</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/150348
-
dc.description.abstract
This paper presents a method for writer retrieval where embeddings of patches extracted at SIFT keypoint locations are learned by a Compact Convolutional Transformer (CCT), a modified attention-based transformer architecture including convolutions, followed by a NetMVLAD layer and Generalized Max Pooling (GMP) to obtain global page descriptors. We introduce the application of CCTs for writer retrieval and show that they outperform Convolutional Neural Networks (CNNs) used in current State-of-the-Art methods for writer retrieval, namely ResNet18, while at the same time only have one-third of the number of parameters. Additionally, we propose Net-MVLAD, an extension of NetVLAD with multiple vocabularies, to encode information with different vocabulary sizes improving the original NetVLAD. An evaluation of the performance of CCTs compared to ResNet18 is provided on the ICDAR2013 Competition on Writer Identification dataset (ICDAR2013) and CVL dataset. The effect of multiple vocabularies applied within the NetVLAD layer is shown. CCT7 pretrained on CIFAR-100 combined with NetMVLAD achieves 89.3% Mean Average Precision (mAP) on the ICDAR2013 dataset and 96.5% on the CVL dataset.
en
dc.description.sponsorship
FFG - Österr. Forschungsförderungs- gesellschaft mbH
-
dc.language.iso
en
-
dc.subject
Writer Retrieval
en
dc.subject
Document Analysis
en
dc.subject
Document Retrieval
en
dc.subject
NetVLAD
en
dc.subject
Vision Transformers
en
dc.title
Writer Retrieval using Compact Convolutional Transformers and NetMVLAD
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.relation.isbn
978-1-6654-9062-7
-
dc.description.startpage
1571
-
dc.description.endpage
1578
-
dc.relation.grantno
879687
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
2022 26th International Conference on Pattern Recognition (ICPR)
-
tuw.container.volume
2022-August
-
tuw.peerreviewed
true
-
tuw.project.title
IT unterstützte Suche und Vergleich von Handschriften
-
tuw.researchTopic.id
I5
-
tuw.researchTopic.name
Visual Computing and Human-Centered Technology
-
tuw.researchTopic.value
100
-
tuw.linking
https://www.icpr2022.com/submission-guidelines/
-
tuw.publication.orgunit
E193-01 - Forschungsbereich Computer Vision
-
tuw.publisher.doi
10.1109/ICPR56361.2022.9956155
-
dc.description.numberOfPages
8
-
tuw.author.orcid
0000-0001-8351-5066
-
tuw.author.orcid
0000-0003-4195-1593
-
tuw.event.name
2022 26th International Conference on Pattern Recognition (ICPR)
en
tuw.event.startdate
21-08-2022
-
tuw.event.enddate
25-08-2022
-
tuw.event.online
Hybrid
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Montréal
-
tuw.event.country
CA
-
tuw.event.presenter
Peer, Marco
-
wb.sciencebranch
Informatik
-
wb.sciencebranch
Mathematik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.oefos
1010
-
wb.sciencebranch.value
90
-
wb.sciencebranch.value
10
-
item.openairetype
conference paper
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.languageiso639-1
en
-
item.fulltext
no Fulltext
-
item.cerifentitytype
Publications
-
item.grantfulltext
restricted
-
crisitem.project.funder
FFG - Österr. Forschungsförderungs- gesellschaft mbH
-
crisitem.project.grantno
879687
-
crisitem.author.dept
E193-01 - Forschungsbereich Computer Vision
-
crisitem.author.dept
E193-01 - Forschungsbereich Computer Vision
-
crisitem.author.dept
E193 - Institut für Visual Computing and Human-Centered Technology
-
crisitem.author.orcid
0000-0001-6843-0830
-
crisitem.author.orcid
0000-0001-8351-5066
-
crisitem.author.orcid
0000-0003-4195-1593
-
crisitem.author.parentorg
E193 - Institut für Visual Computing and Human-Centered Technology
-
crisitem.author.parentorg
E193 - Institut für Visual Computing and Human-Centered Technology