Barisits, M.-S. (2017). Hybrid simulation models for data-intensive systems [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2017.44841
Data-intensive systems; Simulation; System modelling; Machine learning
en
Abstract:
Data-intensive systems are used to access and store massive amounts of data by combining the storage resources of multiple data-centers, usually deployed all over the world, in one system. This enables users to utilize these massive storage capabilities in a simple and effcient way. However, with the growth of these systems it becomes a hard problem to estimate the effects of modifcations to the system, such as data placement algorithms or hardware upgrades, and to validate these changes for potential side effects. This thesis addresses the modeling of operational data-intensive systems and presents a novel simulation model which estimates the performance of system op- erations. The running example used throughout this thesis is the data-intensive system Rucio, which is used as the data management system of the ATLAS experiment at CERN's Large Hadron Collider. Existing system models in literature are not applicable to data-intensive work ows, as they only consider computational work ows or make assumptions which do not hold for operational systems. A hybrid modeling approach is pro- posed which addresses the limits of these models. It partitions the system into discrete components, creates models for these components, and combines them into one concise system model. However, each component model is only built on observed data metrics, such as system traces. The identifcation of which system components to model and which ones to omit is based on a quantitative system analysis of the Rucio data-intensive system. The storage, network, data integrity validation, and services components were identifed. An existing model from literature was utilized for the network component. For the other compon- ents models based on machine learning techniques are created and evaluated against historic workloads from the running example. The component models are unifed in an event simulator and evaluated agains historic workloads from the Rucio data-intensive system. The median relative evaluation error of the hybrid system model is demonstrated with 22%.