Large scale knowledge graphs are commonly used in software products ranging from web applications to the control software of self-driving vehicles. Due to their size, these graphs are usually built by either employing a crowd of people to build them or by scraping already existing information on the web. Both approaches require the collected data to be validated and improved before being suitable to be used in production ready systems. While much current research aims to explore and improve the algorithms required for this task, it is hampered by the lack of annotated datasets containing typical human mistakes (or ambiguities) such as those caused by ambiguous questions or answers. This problem intensifies if graphs have to follow certain restrictions to be of value (eg. containing specific relation types or classes of nodes as used by an existing system), and may even be impossible to solve if specific expert-graphs are required whose contents non-experts would struggle to comprehend. In addition to that, there is currently no existing solution capable of leveraging the structure of a knowledge graph as basis for artificial generation of mistakes. To address this issue, in this thesis we propose an vector embedding based approach called "AmbiVec" to enrich arbitrary graphs with generated, human-like mistakes similar to those made by crowd workers or web scraping approaches. To this end, the adopted methodology includes (1) relying on literature study to investigate the most prevalent sources of ambiguities during crowdsourcing and categorise the mistakes that are caused by them; (2) based on these findings, the design and implementation of an approach, "AmbiVec", to generate configurable amounts of artificial mistakes, using vector embeddings and leveraging similarity between elements in the graph, so mistakes can then easily be used for research; (3) an evaluation of the approach using a crowd sourcing method. Our evaluation shows that our approach works well for mistakes of a small severity that are commonly caused by existing crowd based approaches. User ratings of the severity correlate well with configured severity and workers categorised a portion of our generated ambiguities as being human-like.