Kulmukhametov, A., Rauber, A., & Becker, C. (2021). Improving data quality in large-scale repositories through conflict resolution. In International Journal on Digital Libraries (pp. 365–383). Springer-Verlag. https://doi.org/10.1007/s00799-021-00311-0
E194-01 - Forschungsbereich Information und Software Engineering
International Journal on Digital Libraries
Number of Pages:
Library and Information Sciences; Data quality Technical metadata Digital curation Conflict resolution Content profiling
Digital repositories rely on technical metadata to manage their objects. The output of characterization tools is aggregated and analyzed through content profiling. The accuracy and correctness of characterization tools vary; they frequently produce contradicting outputs, resulting in metadata conflicts. The resulting metadata conflicts limit scalable preservation risk assessment and repository management. This article presents and evaluates a rule-based approach to improving data quality in this scenario through expert-conducted conflict resolution. We characterize the data quality challenges and present a method for developing conflict resolution rules to improve data quality. We evaluate the method and the resulting data quality improvements in an experiment on a publicly available document collection. The results demonstrate that our approach enables the effective resolution of conflicts by producing rules that reduce the number of conflicts in the data set from 17 to 3%. This replicable method for presents a significant improvement in content profiling technology for digital repositories, since the enhanced data quality can improve risk assessment and preservation management in digital repository systems.