dc.description.abstract
Embedded systems are at the heart of modern technology, powering diverse applications ranging from autonomous vehicles to medical devices and Internet of Things (IoT) solutions. These systems face unique constraints, such as limited energy budgets, real-time performance requirements, and stringent reliability demands. However, as processing capabilities continue to grow, the separation of memory and processing in traditional von Neumann architectures presents a significant challenge for embedded systems: the memory wall, which causes delays in data transfer between the processor and memory. This issue, characterized by the disparity between processor speeds and memory access latencies, exacerbates inefficiencies in embedded systems, where both energy and computational resources are scarce. As Richard Sites observed in 1996: "Today’s chips are largely able to execute code faster than we can feed them with instructions and data. The real design action is in memory subsystems—caches, buses, bandwidth, and latency." This statement highlights the persistent challenge in modern computing, particularly for embedded systems where the combination of constrained resources and real-time performance demands amplifies the inefficiencies introduced by the memory wall. To address these challenges, novel approaches have redefined data processing within memory systems. Among these, Processing in Memory (PIM) architectures have emerged as a transformative solution, integrating computation directly within memory to minimize data movement and the associated energy per operation consumption. By reducing the reliance on traditional memory subsystems, PIM alleviates delays caused by the gap between processing speed and memory access times for data loads while enhancing energy efficiency, which is an essential requirement in embedded systems. Despite its promise, existing PIM-based techniques face critical limitations in real-world deployments. For example, Spin Transfer Torque Random Access Memory (STT-RAM), a leading non-volatile memory technology for PIM, suffers from high write energy and latency due to stochastic switching behavior and process variations. Similarly, Static Random Access Memory (SRAM), widely employed in embedded accelerators, experiences inaccuracies in in-memory computations caused by nonlinearities in the Bit-Line (BL) discharge process. These challenges undermine the dependability and energy efficiency of PIM solutions, preventing them from realizing their full potential in memory-centric applications such as artificial intelligence, while limiting their ability to meet the real-time control and high-performance computing demands, which are crucial for embedded systems where speed and reliability are essential. In order to address the above-mentioned challenges, this thesis proposes a cross-layer approach to achieving dependable and energy-efficient design for embedded systems, leveraging both volatile and non-volatile memory technologies. By addressing the unique constraints and performance demands of embedded systems, this thesis presents various innovative PIM-based techniques and system-level tools to overcome these limitations and advance the field of embedded computing. In particular, PIM-based systems face robustness challenges, and the lack of comprehensive modeling and benchmarking frameworks complicates the effective development and utilization of these PIM-based systems. To address these research gaps, this thesis makes the following contributions, a) Developing advanced modeling and benchmarking frameworks for PIM-based systems; b) Designing robust and efficient PIM circuit architectures for improved reliability and performance. Through these key contributions, this thesis advances the field by offering both system-level tools for evaluating STT-RAM architectures and practical solutions for enhancing the efficiency, accuracy, and reliability of SRAM-based memory accelerators. These innovations collectively address long-standing challenges in memory-centric computing systems. To systematically address these challenges, the contributions of this thesis include the aspects of Volatile Memories (VMs) and Non-Volatile Memories (NVMs).1. Volatile Memory• OPTIMA: A modeling framework for rapid design-space exploration of SRAM-based accelerators, addressing circuit nonlinearities and power variations critical for embedded system design.• AID: A circuit design technique that linearizes BL discharge in SRAM, significantly improving accuracy and reducing energy consumption in in-memory multiplication accelerators.• EMAC: A method leveraging digital-to-time Word-Line (WL) modulation and logical weight encoding to enhance energy efficiency and accuracy in analog SRAM-based Multiplication and Accumulation (MAC) accelerators with minimal accuracy degradation in real-world embedded ap-plications.2. Non-Volatile Memory• An open-source, extendable STT-RAM memory controller integrated into the gem5 simulator, enabling evaluations of power, latency, and throughput to guide optimization strategies. • A novel write optimization technique combining stochastic switching and circuit-level approximations to reduce write energy and latency while enhancing robustness against soft errors and process variations.
en