Self-stabilizing Byzantine fault-tolerant clock distribution in grids

Perner, Martin

doi:10.34726/hss.2013.21901

Record link:

https://doi.org/10.34726/hss.2013.21901
http://hdl.handle.net/20.500.12708/2480

Title:

Self-stabilizing Byzantine fault-tolerant clock distribution in grids

Citation:

Perner, M. (2013). Self-stabilizing Byzantine fault-tolerant clock distribution in grids [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2013.21901

reposiTUm DOI:

10.34726/hss.2013.21901

CatalogPlus:

AC11203476

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Perner, Martin

Advisor:

Schmid, Ulrich

Organisational Unit:

E182 - Institut für Technische Informatik

Date (published):

2013

Number of Pages:

Abstract:

This thesis presents design and analysis of a self-stabilizing Byzantine fault-tolerant clock distribution scheme HEX, which allows to distribute a clock signal in a hexagonal grid topology. Typical application domains are VLSI circuits, multi-core processors and other parallel/networked system architectures that require accurately synchronized clocks at neighbor nodes, e.g., for synchronous communication. In sharp contrast to clock trees, which are commonly used for this purpose, HEX tolerates both persistent and transient faults of intermediate nodes and wires and supports multiple synchronized clock sources, as, e.g., used in the multi-synchronous GALS (globally asynchronous locally synchronous) approach. To achieve this, every node in the HEX grid is running a very simple distributed algorithm that forwards clock ticks and also provides the synchronized clock signal locally. A VHDL implementation of the entire HEX algorithm is presented, which also incorporates a digitally controlled clock multiplier. By means of a comprehensive custom testbed, which also includes fault injection, HEX grids of variable size and with different delay parameters could be instantiated, simulated and post-processed. The entire design has been synthesized with the UMC 90 nm ASIC standard cell library, thereby generating a model that could be simulated using Mentor Graphics ModelSim. Comprehensive experiments have been conducted to verify and complement the results of the theoretical worst-case analysis of the achievable synchronization accuracy (clock skew) and the stabilization time, which are also documented in this thesis. In particular, a suite of experiments revealed that the quite exotic worst-case scenarios are extremely unlikely to occur in practice, such that the typical average clock skew is much better than the worst-case. Experiments involving faulty nodes allowed us to also shed light on the excellent properties of HEX in the presence of a substantial number of failures, where analytic results are not available. The results of this work, which has been supported by the Austrian Science Fund (FWF) project FATAL (P21694), have also been published at the 6th International Conference on Dependability (DEPEND'13) and the 25th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'13); a comprehensive journal version is currently under review.

Diese Diplomarbeit präsentiert Design und Analyse eines selbststabilisierenden, Byzantinisch fehlertoleranten Verfahrens (HEX) zur Taktverteilung in einer hexagonalen Grid-Topologie. Typische Anwendungsgebiete sind VLSI-Schaltungen, multi-core Prozessoren und andere parallele/netzwerkgekoppelte Systemarchitekturen, die, z.B. zu Kommunikationszwecken, genau synchronisierte Taktsignale in enachbarten Knoten benötigen. Im Gegensatz zur Taktverteilung mittels einer Baumtopologie, die normalerweise hierfür verwendet wird, toleriert HEX sowohl persistente als auch transiente Fehler von Zwischenknoten und Verbindungsleitungen und unterstützt mehrfache synchronisierte Taktquellen, wie sie etwa in multisynchronen GALS (global asynchronen lokal synchronen) Architekturen verwendet werden. Um das zu bewerkstelligen, läuft auf jedem Knoten im HEX-Grid ein sehr einfacher verteilter Algorithmus, der Takte weiterleitet und auch lokal zur Verfügung stellt. Zentraler Gegenstand der Arbeit ist eine VHDL-Implementierung des HEX-Algorithmus, die auch einen digital kontrollierten Taktmultiplizierer beinhaltet. Ein speziell entwickeltes Testbed, das auch Mechanismen zur Fehlerinjektion bereitstellt, erlaubt die Instantiierung, Simulation und das Post-Processing von HEX-Grids mit unterschiedlicher Größe und Zeitparametern. Das gesamte Design wurde mittels der UMC 90 nm ASIC-Standardzellen-Bibliothek synthetisiert, um ein für die Simulation mittels Mentor Graphics's ModelSim geeignetes Modell zu generieren. Umfassende Experimente wurden durchgeführt, um die Resultate der ebenfalls in dieser Arbeit dokumentierten theoretischen Worst-Case-Analyse der Synchronisationsgenauigkeit (Skew) und der Stabilisierungszeit zu verifizieren und, insbesondere, zu ergänzen. Diese bestätigten, dass die ziemlich exotischen Worst-Case Szenarien in der Praxis extrem unwahrscheinlich sind, sodass der typische mittlere Skew viel geringer als der Worst-Case ist. Experimente mit fehlerhaften Knoten, wo analytische Resultate nicht verfügbar sind, zeigten, dass HEX auch mit einer großen Anzahl fehlerhafter Knoten im Grid hervorragende Eigenschaften aufweist. Die Resultate dieser Arbeit, die vom Österreichischen Fonds zur Förderung der wissenschaftlichen Forschung (FWF) im Rahmen des Projekts FATAL (P21694) unterstützt wurde, konnten auch in den Proceedings der 6th International Conference on Dependability (DEPEND'13) und des 25th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'13) publiziert werden; eine umfassende Journal-Version ist mittlerweile in Begutachtung.

Additional information:

Abweichender Titel laut Übersetzung der Verfasserin/des Verfassers
Zsfassung in dt. Sprache

License:

In Copyright

Appears in Collections:

Thesis