<div class="csl-bib-body">
<div class="csl-entry">Kittenberger, P. (2021). <i>Generating knowledge Graphs with specified ambiguities</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.66143</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2021.66143
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/18025
-
dc.description.abstract
Large scale knowledge graphs are commonly used in software products ranging from web applications to the control software of self-driving vehicles. Due to their size, these graphs are usually built by either employing a crowd of people to build them or by scraping already existing information on the web. Both approaches require the collected data to be validated and improved before being suitable to be used in production ready systems. While much current research aims to explore and improve the algorithms required for this task, it is hampered by the lack of annotated datasets containing typical human mistakes (or ambiguities) such as those caused by ambiguous questions or answers. This problem intensifies if graphs have to follow certain restrictions to be of value (eg. containing specific relation types or classes of nodes as used by an existing system), and may even be impossible to solve if specific expert-graphs are required whose contents non-experts would struggle to comprehend. In addition to that, there is currently no existing solution capable of leveraging the structure of a knowledge graph as basis for artificial generation of mistakes. To address this issue, in this thesis we propose an vector embedding based approach called "AmbiVec" to enrich arbitrary graphs with generated, human-like mistakes similar to those made by crowd workers or web scraping approaches. To this end, the adopted methodology includes (1) relying on literature study to investigate the most prevalent sources of ambiguities during crowdsourcing and categorise the mistakes that are caused by them; (2) based on these findings, the design and implementation of an approach, "AmbiVec", to generate configurable amounts of artificial mistakes, using vector embeddings and leveraging similarity between elements in the graph, so mistakes can then easily be used for research; (3) an evaluation of the approach using a crowd sourcing method. Our evaluation shows that our approach works well for mistakes of a small severity that are commonly caused by existing crowd based approaches. User ratings of the severity correlate well with configured severity and workers categorised a portion of our generated ambiguities as being human-like.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
generieren
de
dc.subject
Ambiguität
de
dc.subject
Knowledge Graph
de
dc.subject
Worteinbettung
de
dc.subject
menschenähnliche Fehler
de
dc.subject
Fehler
de
dc.subject
Quellen der Ambiguität
de
dc.subject
AmbiVec
de
dc.subject
generating
en
dc.subject
ambiguities
en
dc.subject
knowledge graph
en
dc.subject
vector embedding
en
dc.subject
human-like mistakes
en
dc.subject
mistake
en
dc.subject
sources of ambiguity
en
dc.subject
AmbiVec
en
dc.title
Generating knowledge Graphs with specified ambiguities
en
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2021.66143
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Peter Kittenberger
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
dc.contributor.assistant
Biffl, Stefan
-
tuw.publication.orgunit
E194 - Institut für Information Systems Engineering