Symbolic verification of TLA+ specifications with applications to distributed algorithms

Tran, Thanh Hai

doi:10.34726/hss.2024.117518

Record link:

https://doi.org/10.34726/hss.2024.117518
http://hdl.handle.net/20.500.12708/193082

Title:

Symbolic verification of TLA+ specifications with applications to distributed algorithms

Citation:

Tran, T. H. (2023). Symbolic verification of TLA+ specifications with applications to distributed algorithms [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2024.117518

reposiTUm DOI:

10.34726/hss.2024.117518

CatalogPlus:

AC17055314

Publication Type:

Thesis - Dissertation

Language:

English

Authors:

Tran, Thanh Hai

Advisor:

Konnov, Igor

Co-advisor:

Widder, Josef

Organisational Unit:

E192 - Institut für Logic and Computation

Date (published):

2023

Number of Pages:

193

Keywords:

TLA+; APALACHE; symbolic model checking; cutoff; partial synchrony; failure detector; interactive theorem prover; IVy

Abstract:

TLA+ is a language for formal specification of concurrent and distributed protocols. TLA+ is extremely concise yet expressive: The language primitives include Booleans, integers, functions, tuples, records, sequences, and sets thereof, which can be also nested. This is probably why the only model checker for TLA+ (called TLC) relies on explicit enumeration of values and states. This thesis has two main parts. First, we bring symbolic verification to the specification language TLA+. Second, we focus on formal verification techniques for the partial synchrony model of distributed computations. We demonstrate our methodology and approaches to the second part by conducting a case study about the Chandra and Toueg failure detector. Part 1. In this thesis, we first bring symbolic verification to TLA+ specifications by developing the symbolic model checker called APALACHE. Like TLC, APALACHE assumes that all specification parameters are fixed and all states are finite structures. Unlike TLC, APALACHE translates the underlying transition relation into quantifier-free SMT constraints, which allows us to exploit the power of SMT solvers. Designing this translation is the central challenge that we address in the first part of this thesis. Our experiments show that APALACHE outperforms TLC on checking inductive invariants and finding counterexamples in instances with large state spaces. In TLA+, a specification is written as a logical formula without assignments and other imperative statements. To improve APALACHE's performance, we introduce an automatic technique to slice a TLA+ specification in symbolic transitions, which are used as an input to APALACHE. In contrast to TLC, our technique does not explicitly evaluate expressions, but it reduces the problem of finding symbolic transitions to the satisfiability of an SMT formula. Part 2. In the second part of this thesis, we focus on symbolic verification techniques for partial synchrony that is a model of computation in distributed algorithms and modern blockchains. In this model, correctness of algorithms requires the existence of bounds ∆ on message delays and φ on the relative speed of processes after reaching Global Stabilization Time (GST). This makes partially synchronous algorithms parametricin time bounds. Moreover, partially synchronous algorithms are parameterized in the number of processes. Therefore, in general we cannot verify all instances of a partially synchronous distributed algorithm by applying APALACHE or other state-of-the-art model checkers, e.g., TLC, Spin, or NuSMV.The failure detector of Chandra and Toueg is a well-known algorithm to detect crashed processes in a system. Importantly, correctness of the failure detector is based only on the existence of bounds ∆ and φ after GST. Hence, we choose the failure detector as a case study in our thesis. To verify all instances of the failure detector, we develop the two following techniques.While the general parameterized verification is undecidable, many distributed algorithms such as mutual exclusion, cache coherence, and distributed consensus enjoy the cutoff property, which reduces the parameterized verification problem to verification of a finite number of instances. The failure detector does not fall into one of the known classes since they typically rely on point-to-point communication and timeouts. In this thesis, we formalize this communication structure and introduce the class of symmetric point-to-point algorithms. We show that the symmetric point-to-point algorithms have a cutoff. By these results, it is sufficient to verify the failure detector by checking instances with only two processes. We do parametric verification of both safety and liveness of the failure detector in three frameworks: TLA+, counter automata, and IVy. We introduce encoding techniques and an abstraction of in-transit messages to efficiently specify the failure detector in each framework. By running the model checkers for TLA+ (TLC and APALACHE) and counter automata (FAST), we prove safety for fixed time bounds. This helps us to find the inductive invariants for fixed parameters, which we used as a starting point for the proofs with IVy. By running IVy, we prove safety for arbitrary time bounds. Moreover, we show how to verify liveness of the failure detector by reducing the verification problem to safety verification. Thus, both properties are verified by developing inductive invariants with IVy. We conjecture that correctness of other partially synchronous algorithms may be proven by following the presented methodology.

License:

In Copyright

Appears in Collections:

Thesis