Solomakhina, N. (2014). Combining ontologies and statistics for sensor data quality improvement [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2014.23187
In large industries usage of advanced technological methods and modern equipment comes with the problem of storing, interpreting and analyzing huge amount of information. Typical sources for this data include a myriad of sensors mounted at the industrial machinery, measuring qualities such as temperatures, movement and vibration, pressure, and many more. However, these sensors are complex technical devices, which means that they can fail and their readings can become unreliable, or -dirty-. Low quality data makes it hard to solve the original task of assessing system and process status and controlling the system behavior. So, data quality is one of the major challenges considering a rapid growth of information, fragmentation of information systems, incorrect data formatting and other issues. The aim of this thesis is to propose a novel approach to address data quality issues in industrial datasets, in particular, measurements of sensors mounted at power generation facilities. The most common approach to detect anomalies in data is the analysis by means of the statistical and machine learning techniques. However, analyzing data alone can not always give satisfactory results. For instance, suspicious sensor readings may not indicate at bad quality of data, but at an appliance functioning abnormality detected by this sensor. Therefore, we propose to use additional available information on the domain. The approach presented in this work brings together several well-known techniques, which come from the worlds of computational logic and statistics, improving the results of data quality assessment and improvement procedure. The application domain and the dependencies between its objects are represented as a knowledge-based model, while statistics identifies data anomalies, such as outlying or missing values, in sensor measurement data. In this work we represent domain knowledge in OWL ontology, which covers the topology of an industrial equipment and an information about measuring devices installed. Providing statistical computations with the additional information from the model allows to validate and improve the results. Thus, comparing and analyzing readings provided by sensors of the same type and mounted at the same component of an appliance helps to identify possibly damaged sensors, as well as to distinguish between data quality inconsistencies found in single sensor readings from anomalies in machinery functioning detected by other measuring devices. Based on the proposed approach a software demonstrator has been implemented and tested, proving that the usage of the additional information provided by the semantic model improves the results of statistical analysis.
en
Additional information:
Abweichender Titel laut Übersetzung der Verfasserin/des Verfassers Zsfassung in dt. Sprache. - Literaturverz. S. 83 - 91