<div class="csl-bib-body">
<div class="csl-entry">Sagi, T., Zaga, M., Rusinek, S., Fekete, M. R., Bjerva, J., & Hose, K. (2025). Utilizing phonetic similarity for cross-source and cross-language toponym matching: a benchmark and prototype. <i>Language Resources and Evaluation</i>, <i>59</i>(3), 2427–2451. https://doi.org/10.1007/s10579-025-09812-9</div>
</div>
-
dc.identifier.issn
1574-020X
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/218552
-
dc.description.abstract
The writings of one ancient civilization often overlap in time and space with others. Many of these sources comprise unstructured text in ancient languages, causing scholars studying these civilizations to be siloed, often relying on sources in specific languages. Most recent efforts to extract structured information from historical scripts into place (toponym) and people databases (prospographies) have followed this pattern, focusing on one civilization and selected sources. The path to creating a common database runs through aligning names or toponyms between sources from disparate languages utilizing different scripts. Existing multi-lingual orthographic (string-based) comparison often relies on transliteration to a common script (Latin/English). Transliteration often creates multiple options and even more confusion. However, when integrating sources that overlap in space and time, the languages often share a common phonetic background. This commonality may prove beneficial. In this work, we present a benchmark for comparing toponyms from two linguistically and culturally related languages, namely Hebrew and Arabic. We provide a benchmark comprised of a set of dataset pairs created from historical sources written in Medieval variants of these languages, later historical Gazetteers and a modern dataset curated from Wikidata. We empirically evaluate several toponym comparison approaches over the benchmark: transliteration to a common script, direct transliteration, and phonetic comparison using a common phonetic representation. We discuss the results and the limitations of the various methods and outline future work.
en
dc.language.iso
en
-
dc.publisher
SPRINGER
-
dc.relation.ispartof
Language Resources and Evaluation
-
dc.subject
Grapheme to phoneme
en
dc.subject
Multi-lingual
en
dc.subject
Toponym matching
en
dc.subject
Transliteration
en
dc.title
Utilizing phonetic similarity for cross-source and cross-language toponym matching: a benchmark and prototype