Lepadat, M.-A. (2019). Rule-based recommender for feature engineering in big data [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2019.65802
Feature engineering is of high importance for the success of many machine learning algorithms and requires domain-specific knowledge. Generally, this knowledge is only familiar to domain experts or incorporated into programs. We developed a knowledge-driven approach to support users during feature engineering and implemented a software application to evaluate this approach. The knowledge is represented in Web Ontology Language (OWL) and its main purpose is to offer the user a flexible way to tackle domain-specific datasets by building a reusable and comprehensible knowledge base. A semantic reasoner makes use of this knowledge to infer properties and provide users with recommendations. All data-related operations are performed in a scalable cluster computing engine backed up by Apache Spark. The evaluation is done on 6 freely available datasets from the domain of demographics. We were able to identify only a small fraction of recommendations that proved to be wrong.
en
Additional information:
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers