Stoilkovska, I. (2021). Modeling and verification of synchronous fault-tolerant distributed algorithms [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.90331
Distributed algorithms; Fault tolerance; Modeling; Model checking; Abstraction; Threshold automata
en
Abstract:
Distributed systems have a wide range of applications: from autonomous vehicles, via data centers, to cryptocurrencies. A distributed system consists of multiple processes, which execute an algorithm locally and coordinate globally in order to reach a common goal. During the lifetime of a distributed system, some of its processes can exhibit faulty behaviors, whose severity and occurrence are unpredictable. To ensure that the system works correctly when some of its processes are faulty, distributed algorithms are designed to be fault-tolerant. Design of fault-tolerant distributed algorithms has been an active field in the area of theoretical computer science for over 40 years. As these algorithms are implemented in systems where a high degree of availability and reliability is expected, their correctness, as well as the correctness of their implementations, is paramount. The design and implementation of fault-tolerant distributed algorithms are led by algorithm designers and by software engineers, respectively. Both tasks are quite challenging, and thus susceptible to human error. Automated verification provides means for finding bugs at early stages in the process of designing and implementing correct fault-tolerant distributed algorithms. However, several characteristics, inherent to distributed systems, represent obstacles for the direct application of automated verification techniques. Moreover, fault-tolerant distributed algorithms are parameterized, in the sense that their execution is influenced by a set of parameters, whose values are a priori unknown, such as, e.g., the number n of processes. Thus, reasoning about the correctness of such an algorithm amounts to reasoning about the correctness of infinitely many finite instances of the algorithm. In the formal methods literature, this problem is called the parameterized verification problem, and is undecidable in general. In this thesis, we propose methods for formal specification and techniques for automated verification of fault-tolerant distributed algorithms. We focus on synchronous algorithms, whose computations are organized in rounds, in which the processes execute the algorithm synchronously. To solve the parameterized verification problem, we (i) propose formal models for the behavior of a process running a synchronous fault-tolerant distributed algorithm, (ii) define techniques for reducing certain instances of the parameterized verification problem for synchronous fault-tolerant distributed algorithms to some decidable problem, and (iii) show the usefulness our techniques by experimental evaluation, i.e., by automatically verifying several synchronous fault-tolerant distributed algorithms from the literature.