Bors, C. (2019). Facilitating data quality assessment utilizing visual analytics: tackling time, metrics, uncertainty, and provenance [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2019.76147
E193 - Institut für Visual Computing and Human-Centered Technology
-
Date (published):
2019
-
Number of Pages:
231
-
Keywords:
data quality assessment; provenance; metrics; visual analytics; uncertainty
en
Abstract:
Visual and interactive data analysis is a large field of research that is successfully used in commercial tools and systems to allow analysts make sense of their data. Data is often riddled with issues, which makes analysis difficult or even not feasible. Pre-processing data for downstream analysis also involves resolving these issues. We may employ Visual Analytics methods to identify and correct issues and eventually wrangle the data into a usable format. Various aspects are critical during issue correction: (1) how are the issues resolved, (2) to what extent did this affect the dataset, and (3) did the used routines actually resolve the issues appropriately. In this thesis I employ data quality metrics and uncertainty to capture provenance from pre-processing operations and pipelines. Data quality metrics are used to show the prevalence of errors in a dataset, and uncertainty can quantify the changes applied to a data values and entries during processing. Capturing such measures as provenance and visualizing it in an exploratory environment can allow analysts to determine how pre-processing steps affected a dataset, and if the issues, that were initially discovered, could be resolved in a minimal way, so the data is representative of the original dataset. Within the course of this thesis I employed a user-centered design methodology to develop Visual Analytics prototypes and visualization techniques that combine techniques from data quality, provenance, and uncertainty research.