Architectural-Space Exploration of Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors

Prabakaran, Bharath Srinivas; Dave, Mihika; Kriebel, Florian; Rehman, Semeen; Shafique, Muhammad

doi:10.1109/ACCESS.2019.2945622

DC Field

Value

Language

dc.contributor.author

Prabakaran, Bharath Srinivas

dc.contributor.author

Dave, Mihika

dc.contributor.author

Kriebel, Florian

dc.contributor.author

Rehman, Semeen

dc.contributor.author

Shafique, Muhammad

dc.date.accessioned

2020-06-27T17:53:03Z

dc.date.issued

2019

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Prabakaran, B. S., Dave, M., Kriebel, F., Rehman, S., & Shafique, M. (2019). Architectural-Space Exploration of Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors. <i>IEEE Access</i>. https://doi.org/10.1109/ACCESS.2019.2945622</div> </div>

dc.identifier.issn

2169-3536

dc.identifier.uri

https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:3-8519

dc.identifier.uri

http://hdl.handle.net/20.500.12708/781

dc.description.abstract

State-of-the-art reliability techniques and mechanisms deploy full-scale redundancy, like double or triple modular redundancy (DMR, TMR), on different layers of the computing stack to detect and/or correct such transient faults. However, the techniques relying on full-scale redundancy incur significant area, performance, and/or power overheads, which might not always be feasible/practical due to system constraints such as deadlines and available power budget for the full chip (or a processor core). In this work, we propose a novel design methodology to generate and explore the architectural-space of heterogeneous reliability modes for out-of-order superscalar multi-core processors. These heterogeneous modes enable varying reliability and power/area trade-offs, from which an optimal configuration can be chosen at run time to meet the reliability requirements of a given system while reducing the corresponding power overheads (or solving the inverse problem, i.e., maximizing the reliability under a given power constraint). Our experimental results show that a pareto-optimal heterogeneous reliability mode reduces the core vulnerability by 87%, on average, across multiple application workloads, with area and power overheads of 10% and 43%, respectively. To further enhance the design space of heterogeneous reliability modes, we investigate the effectiveness of combining different processor state compression techniques like Distributed Multi-threaded Checkpointing (DMTCP), Hash-based Incremental Checkpointing (HBICT) and GNU zip, such that the correct processor state can be recovered once a fault is detected. We reduced the checkpoint sizes by a factor of ~6× using a unique combination of different state compression techniques.

dc.language

English

dc.language.iso

dc.publisher

IEEE

dc.relation.ispartof

IEEE Access

dc.rights.uri

http://creativecommons.org/licenses/by/4.0/

dc.subject

Reliability

dc.subject

multi-cores

dc.subject

heterogeneity

dc.subject

fault-tolerance

dc.subject

AVF

dc.subject

hardening

dc.subject

microprocessors

dc.subject

superscalar

dc.subject

resilience

dc.subject

design space exploration

dc.subject

checkpointing

dc.subject

out-of-order

dc.subject

architecture

dc.title

Architectural-Space Exploration of Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors

dc.type

Article

dc.type

Artikel

dc.rights.license

Creative Commons Namensnennung 4.0 International

dc.rights.license

Creative Commons Attribution 4.0 International

dc.contributor.affiliation

University of Illinois Urbana-Champaign, United States of America (the)

dc.relation.grantno

German Research Foundation (DFG)

dc.relation.grantno

‘‘Dependable Embedded Systems, SPP 1500

dc.rights.holder

The Author(s) 2019

dc.type.category

Original Research Article

tuw.journal.peerreviewed

true

tuw.peerreviewed

true

tuw.version

vor

dcterms.isPartOf.title

IEEE Access

tuw.publication.orgunit

E191 - Institut für Computer Engineering

tuw.publication.orgunit

E384 - Institut für Computertechnik

tuw.publisher.doi

10.1109/ACCESS.2019.2945622

dc.identifier.eissn

2169-3536

dc.identifier.libraryid

AC15576021

dc.identifier.urn

urn:nbn:at:at-ubtuw:3-8519

tuw.author.orcid

0000-0003-0557-2166

dc.rights.identifier

CC BY 4.0

dc.rights.identifier

CC BY 4.0

wb.sci

true

item.openairetype

research article

item.cerifentitytype

Publications

item.grantfulltext

open

item.languageiso639-1

item.openairecristype

http://purl.org/coar/resource_type/c_2df8fbb1

item.openaccessfulltext

Open Access

item.fulltext

with Fulltext

crisitem.author.dept

E191-02 - Forschungsbereich Embedded Computing Systems

crisitem.author.dept

University of Illinois Urbana-Champaign

crisitem.author.dept

E191-02 - Forschungsbereich Embedded Computing Systems

crisitem.author.dept

E384 - Institut für Computertechnik

crisitem.author.dept

E191-02 - Forschungsbereich Embedded Computing Systems

crisitem.author.parentorg

E191 - Institut für Computer Engineering

crisitem.author.parentorg

E191 - Institut für Computer Engineering

crisitem.author.parentorg

E350 - Fakultät für Elektrotechnik und Informationstechnik

crisitem.author.parentorg

E191 - Institut für Computer Engineering

Appears in Collections:

Article

Prabakaran Bharath Srinivas - 2019 - Architectural-Space Exploration of...pdf

Adobe PDF

(2.98 MB)

Show simple item record

Page view(s)

476

checked on Nov 21, 2023

Download(s)

178

checked on Nov 21, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM