Perner, M. (2019). Fault-tolerant clock distribution in grid-like networks [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2019.72064
This thesis investigates how to design a system to distribute and use a clock signal in a grid-like network, as employed in, e.g., SoCs and NoCs. Ideally, the clock is generated by multiple components and has self-stabilization and fault-tolerance properties, i.e., recovers from transient faults even under permanent faults. These clock generation components can be distributed over the whole chip to provide a clock signal for every part of the chip. This, however, comes at a price: The increase in wire length between the components increases the skew between the generating components, and hence the clock skew at the clock boundaries between the components. Furthermore, such clock generation systems are expensive in terms of chip area, and usually require a fully-connected topology. Hence, putting the clock generating components in close proximity is favorable. However, just generating such a clock somewhere in a chip is not enough: After all, the clock signal needs to be distributed to all components that need it. The challenge for clock distribution is to not diminish or even remove the self-stabilization and fault- tolerance properties of the generated clock during distribution. To this end, we present a clock distribution system, HEX, that allows recovery from transient faults under permanent faults. Since HEX, which is based on a hexagonal grid, shows a large spread in the node-to-node skew under faults, we explored alternative interconnection topologies. Under the assumption that a suitable clock gen- eration is able to provide a skew bounded by the difference in the wire delay between its components, the TRIX topology has been identified as the best trade-off between performance and implementation efficiency. For TRIX, a transistor cell model of the clock distribution node has been constructed as well. Given that the distributed clock propagates like a wave through the grid, it provides a synchronization source for the nodes of the grid already. However, the achievable degree of synchrony is not enough to be able to utilize the synchronous design paradigm for communication between the nodes of the grid. We show that high-speed communication is nevertheless feasible in such multi-synchronous GALS architectures, however, by using a FIFO buffer for mitigating the clock skew and thus allowing data transmission in every clock cycle. Using a special buffer management approach, our communication scheme can be guaranteed to be self-stabilizing when the underlying clocking system is. Hence, our communication scheme is fully compatible with, but not limited to, HEX and TRIX.