Peer, M. (2025). Writer Retrieval for Historical Documents [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.131340
This thesis focuses on writer retrieval, a task that ranks documents of interest based on their similarity to a given handwritten query sample. The primary objective is to identify documents authored by the same writer, with the similarity determined through feature extraction and application of a distance or similarity metric. Manual identification of writers is tedious due to the extensive expertise and time required to analyze and compare handwriting samples. Additionally, the sheer volume of documents in large historical collections makes the process labor-intensive for scholars. The methodology developed in this thesis is particularly valuable for historians and paleographers dealing with historical documents, although it is not restricted to a particular domain of handwriting. The approaches are specifically designed for historical documents, which often suffer from degradation, lack adequate annotations, and exhibit significant variability in writing styles, languages, and tools used. Current approaches struggle with these challenges due to their reliance on modern, clean datasets and their inability to handle variations such as different script styles found in historical handwriting. Additionally, existing methods frequently fail to account for document degradation and fragments containing only small amounts of hand-writing. The first part discusses advancements made to the general writer retrieval methodology, including two encoding schemes based on NetVLAD and a graph-based reranking strategy, targeted to improve the retrieval performance on contemporary as well as historical documents. Secondly, the thesis explores two new approaches in the field of self-supervised learning for historical handwriting. They are based on the training of deep-learning-based approaches on large corpora of manuscripts to extract writer-discriminative features. Without any use of labels, they are able to improve writer retrieval. Another line of work is tailored for the domain of Greek papyrus, which contains high degradation due to its age, and the data is sparse. Additionally, two new datasets are introduced for Greek papyrus, with the second one proposing a methodology for character-specific writer retrieval. This branch also covers the task of fragment retrieval, which deals with torn documents such as papyri to aid scholars in reconstruction and assignment of fragments.
en
Additional information:
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers