<div class="csl-bib-body">
<div class="csl-entry">Stippel, C., Pratellesi, C., Schwendinger, B., & Hoch, R. (2025). A Deep Learning Approach for Event-Driven Duplicate Detection. In <i>Proceedings of The 8th International Conference on Information and Computer Technologies</i> (pp. 151–156). https://doi.org/10.1109/ICICT64582.2025.00030</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/222802
-
dc.description.abstract
Clean datasets are paramount in many industrial applications as they are the foundation for the analytic processes and services offered to customers. In this regard, duplicates are a common problem when multiple entries represent the same real-world entity. Duplicate detection is the process of identifying these entries and has been intensively studied in research and practice. However, existing methods hardly take into account the underlying real events that lead to duplicates. We argue that duplicates resulting from such events, e.g., relocation of a person, exhibit specific characteristics and, hence, can hardly be identified by existing approaches, which commonly focus on high similarity with respect to attribute values. To address this issue, we propose an event-driven duplicate detection approach based on deep learning that does not rely on limiting assumptions and can detect duplicates arising from multiple real-world events. Specifically, we combine neural networks with classical string distance metrics to classify duplicate records by learning the subtle differences and similarities between pairs of records. We demonstrate our approach's practical applicability and effectiveness using a real-world scenario by analyzing duplicates of an address book. The evaluation shows that our approach is reliable and useful for decision support, while outperforming existing approaches for duplicate detection.
en
dc.description.sponsorship
FFG - Österr. Forschungsförderungs- gesellschaft mbH
-
dc.language.iso
en
-
dc.subject
deep learning
en
dc.subject
duplicate detection
en
dc.subject
record linkage
en
dc.subject
siamese networks
en
dc.subject
string distance metrics
en
dc.title
A Deep Learning Approach for Event-Driven Duplicate Detection
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.relation.isbn
9798331505189
-
dc.description.startpage
151
-
dc.description.endpage
156
-
dc.relation.grantno
898626
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
Proceedings of The 8th International Conference on Information and Computer Technologies
-
tuw.peerreviewed
true
-
tuw.project.title
Quantifying Trustworthiness of Data
-
tuw.researchTopic.id
I5
-
tuw.researchTopic.name
Visual Computing and Human-Centered Technology
-
tuw.researchTopic.value
100
-
tuw.publication.orgunit
E193-01 - Forschungsbereich Computer Vision
-
tuw.publisher.doi
10.1109/ICICT64582.2025.00030
-
dc.description.numberOfPages
6
-
tuw.author.orcid
0000-0003-0482-902X
-
tuw.author.orcid
0009-0000-0455-9563
-
tuw.author.orcid
0000-0003-3315-8114
-
tuw.author.orcid
0000-0002-8131-1091
-
tuw.event.name
The 8th International Conference on Information and Computer Technologies
en
tuw.event.startdate
14-03-2025
-
tuw.event.enddate
16-03-2025
-
tuw.event.online
On Site
-
tuw.event.type
Event for scientific audience
-
tuw.event.country
US
-
tuw.event.presenter
Hoch, Ralph
-
wb.sciencebranch
Informatik
-
wb.sciencebranch
Mathematik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.oefos
1010
-
wb.sciencebranch.value
90
-
wb.sciencebranch.value
10
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.grantfulltext
none
-
item.cerifentitytype
Publications
-
item.fulltext
no Fulltext
-
item.openairetype
conference paper
-
item.languageiso639-1
en
-
crisitem.project.funder
FFG - Österr. Forschungsförderungs- gesellschaft mbH
-
crisitem.project.grantno
898626
-
crisitem.author.dept
E193-01 - Forschungsbereich Computer Vision
-
crisitem.author.dept
E384-01 - Forschungsbereich Software-intensive Systems
-
crisitem.author.dept
E384-01 - Forschungsbereich Software-intensive Systems
-
crisitem.author.dept
E384-01 - Forschungsbereich Software-intensive Systems
-
crisitem.author.orcid
0000-0003-3315-8114
-
crisitem.author.parentorg
E193 - Institut für Visual Computing and Human-Centered Technology