Weise, M., & Rauber, A. (2024). DBRepo: A Data Repository System for Research Data in Databases. In 2024 IEEE International Conference on Big Data (BigData) (pp. 322–331). https://doi.org/10.1109/BigData62323.2024.10825401
Research Data Repositories; Relational Databases; Research Data Management
In the era of big data, research has become increasingly data-driven, with vast amounts of information being generated and analyzed to produce new insights and discoveries. This data deluge requires a combination of methods and technologies to store, process, share and preserve research data. With many of the world’s most valuable data being stored in relational databases where it evolves over time as new knowledge is gained and old knowledge invalidated, current repository systems fail to provide researchers with interfaces to conveniently work with this kind of data within their research environments. For this reason, we have developed DBRepo, an institutional data repository for research data in databases (DBRepo) supporting guidelines of the Working Group on Data Citation of the Research Data Alliance. The system has been in use at TU Wien for almost three years now and provides a variety of data science-related interfaces and can be integrated into many workflows and tools. Further, it assists researchers in depositing their datasets by suggesting the table schema (column names, data types, primary key constraints) and it addresses data interoperability issues by suggesting semantic concepts for dataset columns and units of measurements, where applicable. DBRepo is currently in use by six universities globally who use it as data store for hot and cold research data sets. In the paper, we describe their use-cases and provide lessons learned from the various deployments and workflows. Finally, we show how depositing research data into DBRepo increases the data’s visibility.