Generating knowledge Graphs with specified ambiguities

Kittenberger, Peter

doi:10.34726/hss.2021.66143

DC Field

Value

Language

dc.contributor.advisor

Sabou, Reka Marta

dc.contributor.author

Kittenberger, Peter

dc.date.accessioned

2021-07-08T11:52:05Z

dc.date.issued

2021

dc.date.submitted

2021-06

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Kittenberger, P. (2021). <i>Generating knowledge Graphs with specified ambiguities</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.66143</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2021.66143

dc.identifier.uri

http://hdl.handle.net/20.500.12708/18025

dc.description.abstract

Large scale knowledge graphs are commonly used in software products ranging from web applications to the control software of self-driving vehicles. Due to their size, these graphs are usually built by either employing a crowd of people to build them or by scraping already existing information on the web. Both approaches require the collected data to be validated and improved before being suitable to be used in production ready systems. While much current research aims to explore and improve the algorithms required for this task, it is hampered by the lack of annotated datasets containing typical human mistakes (or ambiguities) such as those caused by ambiguous questions or answers. This problem intensifies if graphs have to follow certain restrictions to be of value (eg. containing specific relation types or classes of nodes as used by an existing system), and may even be impossible to solve if specific expert-graphs are required whose contents non-experts would struggle to comprehend. In addition to that, there is currently no existing solution capable of leveraging the structure of a knowledge graph as basis for artificial generation of mistakes. To address this issue, in this thesis we propose an vector embedding based approach called "AmbiVec" to enrich arbitrary graphs with generated, human-like mistakes similar to those made by crowd workers or web scraping approaches. To this end, the adopted methodology includes (1) relying on literature study to investigate the most prevalent sources of ambiguities during crowdsourcing and categorise the mistakes that are caused by them; (2) based on these findings, the design and implementation of an approach, "AmbiVec", to generate configurable amounts of artificial mistakes, using vector embeddings and leveraging similarity between elements in the graph, so mistakes can then easily be used for research; (3) an evaluation of the approach using a crowd sourcing method. Our evaluation shows that our approach works well for mistakes of a small severity that are commonly caused by existing crowd based approaches. User ratings of the severity correlate well with configured severity and workers categorised a portion of our generated ambiguities as being human-like.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

generieren

dc.subject

Ambiguität

dc.subject

Knowledge Graph

dc.subject

Worteinbettung

dc.subject

menschenähnliche Fehler

dc.subject

Fehler

dc.subject

Quellen der Ambiguität

dc.subject

AmbiVec

dc.subject

generating

dc.subject

ambiguities

dc.subject

knowledge graph

dc.subject

vector embedding

dc.subject

human-like mistakes

dc.subject

mistake

dc.subject

sources of ambiguity

dc.subject

AmbiVec

dc.title

Generating knowledge Graphs with specified ambiguities

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2021.66143

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Peter Kittenberger

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Biffl, Stefan

tuw.publication.orgunit

E194 - Institut für Information Systems Engineering

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC16251303

dc.description.numberOfPages

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0001-9301-8418

tuw.assistant.orcid

0000-0002-3413-7780

item.languageiso639-1

item.openairetype

master thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.openaccessfulltext

Open Access

crisitem.author.dept

TU Wien

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(1.35 MB)

In Copyright

Show simple item record

Page view(s)

425

checked on Nov 22, 2023

Download(s)

181

checked on Nov 22, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM