Doppelhammer, C. (2023). Data quality assessment in large enterprises : A practical evaluation [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.93900
E194 - Institut für Information Systems Engineering
-
Date (published):
2023
-
Number of Pages:
104
-
Keywords:
data quality; enterprise; data quality methodology; data quality tools; data quality assessment
en
Abstract:
Data has become one of the most important driving factors for successful enterprises. Companies with reliable, complete and detailed data on their business can make informed decisions. Relying on sound data for decision-making is imperative and therefore the following data quality problems need to be avoided: outdated, incomplete or, even worse, completely missing data. These problems can accumulate over time if no data and data quality management methods are in place. Large companies with hundreds or thousands of employees can experience these problems entirely on another scale because, often, such organisations can be regionally distributed and split up into multiple sub-organisations, making managing data even more challenging. Research in the area of data quality and how to manage data in business contexts has started since data is stored digitally by companies. Different methodologies were designed to help manage data, measure its quality, and improve data. These data quality methodologies differ in their approaches and focus. Some try to be general-purpose methodologies, while others focus on a specific application area, such as enterprises. Only limited research regarding data quality methodologies, which apply to enterprise-specific requirements, can be found that compares such methodologies and tries to recommend them. This diploma thesis aims to address this gap in the literature by investigating two research questions: first, which data quality methodologies regarding enterprise requirements can be recommended, and second, which software tools exist and can be recommended to enterprises regarding their needs. To answer these questions, a requirement elicitation process was executed in cooperation with a large enterprise, which resulted in requirements regarding their challenges with data quality. The process of assessing data quality stands in the foreground of this thesis, but other aspects, such as improving data were also considered. The acquired requirements were checked against the capabilities of different data quality methodologies and resulted in a comparison table of fulfilled, partially or not fulfilled requirements per methodology. After assessing data quality methodologies, the same comparison process was performed for data quality tools. Based on the research done in this thesis, the data quality methodology Data Quality In Cooperative Information Systems (DaQuinCIS) is the most appropriate one for large enterprises because it includes concrete recommendations for implementations and also covers the most important requirements. In terms of best-fitting tools, Experian Aperture Data Studio (EADS) and Great Expectations (GX) fulfilled most of the defined requirements. A detailed recommendation on best-fitting data quality methodologies and tools is given at the end of this thesis.