Reduction techniques for parameterized model checking and synthesis of fault-tolerant distributed algorithms

Lazić, Marijana

doi:10.34726/hss.2019.67803

Datensatz Zitierlink:

https://doi.org/10.34726/hss.2019.67803
http://hdl.handle.net/20.500.12708/11417

Titel:

Reduction techniques for parameterized model checking and synthesis of fault-tolerant distributed algorithms

Zitat:

Lazić, M. (2019). Reduction techniques for parameterized model checking and synthesis of fault-tolerant distributed algorithms [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2019.67803

reposiTUm-DOI:

10.34726/hss.2019.67803

CatalogPlus:

AC15391402

Publikationstyp:

Hochschulschrift - Dissertation

Sprache:

Englisch

Autor_innen:

Lazić, Marijana

Betreuer_in:

Widder, Josef

Organisationseinheit:

E192 - Institut für Logic and Computation

Datum (veröffentlicht):

2019

Umfang:

208

Keywords:

parameterized model checking; program synthesis; fault-tolerant distributed algorithms; threshold automata; Byzantine faults; randomized distributed algorithms

Abstract:

Distributed algorithms have many applications in everyday life, as well as in avionic and automotive embedded systems, computer networks, and the Internet of Things. The central idea is to achieve dependability by replication, and to ensure that all correct replicas behave as one, even if some of the replicas fail. In this way, the correct operation of the system is more reliable than the correct operation of its parts. Fault-tolerant algorithms are typically used in applications where highest reliability is required because human lives might be at risk, such as in automotive or avionic industries, and even unlikely failures of the system are not acceptable. Therefore, correctness of fault-tolerant algorithms is of utmost importance. New application domains, such as cloud computing or blockchain technologies, provide new motivation to study fault-tolerant algorithms. With the huge number of computers involved, faults are the norm rather than the exception. Consequently, fault-tolerance becomes an economic necessity, and so does the correctness of mechanisms for fault-tolerance. In order to deal with real-life distributed systems with thousands of components, we need techniques for analyzing fault-tolerant distributed algorithms for every system size, that is, for every number of components and every admissible number of faults. We use special variables n, f and t, called parameters, to describe the number of processes in the system, the number of faults, and an upper bound on the number of faults, respectively. An algorithm is considered to be correct if it satisfies its specifications for every n; f; t, under a certain prerequisite, e.g., n > 3t, called resilience condition. Verifying such general correctness is called parameterized verification, and synthesizing correct algorithms from their specifications is in this case called parameterized synthesis. Reasoning about correctness of fault-tolerant distributed algorithms (FTDAs) is an immensely difficult task, due to the non-deterministic behavior caused by the presence of faults, concurrency, message delays, etc. The classical approach toward correctness consists of pencil-and-paper mathematical proofs, that require human ingenuity and huge manual eorts. As distributed algorithms show complex behavior, and are difficult to understand for human engineers, there is only very limited tool support to catch logical errors in fault-tolerant distributed algorithms, especially at design time. In this thesis we focus on the class of asynchronous FTDAs where processes communicate by message passing and compare numbers of received messages against so-called thresholds. For instance, a process might be allowed to perform an action only if it has received messages from a majority of processes in the system, i.e., at least n 2 messages. Thresholds are typically linear combinations of parameters, and thus they constitute a parameterization of the code itself. For asynchronous FTDAs, most existing verification and synthesis techniques are either only able to deal with small instances, for example, with four processes among which one might fail, or they cannot deal with (i) parameterized code, (ii) arithmetic resilience conditions, or (iii) specifications of distributed algorithms. In this thesis we develop techniques for the parameterized analysis of this class of FTDAs that can cope with all the mentioned challenges. More specifically, our methods are based on reduction techniques, that is, on reasoning about dependency of concurrently executed events. The contribution of this thesis is threefold: (1) We present a complete framework for the parameterized verification of temporal properties of this particular class of FTDAs, that is, a method for certifying the correctness of an algorithm for any system size. Our method addresses both safety and liveness speci cations of algorithms. (2) We introduce an automated technique for synthesizing FTDAs that are correct for any system size, rather than for a fixed size. In fact, for a given so-called skeleton of an algorithm from our class, we automatically detect thresholds, parameterized in the number of processes in total and the number of faults, that yield correct algorithms. (3) Additionally, we extend the class of the considered algorithms, by allowing probabilistic transitions, such as coin tosses, and specifications that hold with probability 1. We reduce the problem of veri cation of such randomized FTDAs to the non-probabilistic setting. In this way, we present the first automated method for parameterized verification of randomized fault-tolerant distributed algorithms and their associated probabilistic properties.

Lizenz:

Urheberrechtsschutz

Enthalten in den Sammlungen:

Thesis