DC FieldValueLanguage
dc.contributor.authorPrabakaran, Bharath Srinivas-
dc.contributor.authorDave, Mihika-
dc.contributor.authorKriebel, Florian-
dc.contributor.authorRehman, Semeen-
dc.contributor.authorShafique, Muhammad-
dc.date.accessioned2020-06-27T17:53:03Z-
dc.date.issued2019-
dc.identifier.issn2169-3536-
dc.identifier.urihttps://resolver.obvsg.at/urn:nbn:at:at-ubtuw:3-8519-
dc.identifier.urihttp://hdl.handle.net/20.500.12708/781-
dc.description.abstractState-of-the-art reliability techniques and mechanisms deploy full-scale redundancy, like double or triple modular redundancy (DMR, TMR), on different layers of the computing stack to detect and/or correct such transient faults. However, the techniques relying on full-scale redundancy incur significant area, performance, and/or power overheads, which might not always be feasible/practical due to system constraints such as deadlines and available power budget for the full chip (or a processor core). In this work, we propose a novel design methodology to generate and explore the architectural-space of heterogeneous reliability modes for out-of-order superscalar multi-core processors. These heterogeneous modes enable varying reliability and power/area trade-offs, from which an optimal configuration can be chosen at run time to meet the reliability requirements of a given system while reducing the corresponding power overheads (or solving the inverse problem, i.e., maximizing the reliability under a given power constraint). Our experimental results show that a pareto-optimal heterogeneous reliability mode reduces the core vulnerability by 87%, on average, across multiple application workloads, with area and power overheads of 10% and 43%, respectively. To further enhance the design space of heterogeneous reliability modes, we investigate the effectiveness of combining different processor state compression techniques like Distributed Multi-threaded Checkpointing (DMTCP), Hash-based Incremental Checkpointing (HBICT) and GNU zip, such that the correct processor state can be recovered once a fault is detected. We reduced the checkpoint sizes by a factor of ~6× using a unique combination of different state compression techniques.en
dc.languageEnglish-
dc.language.isoen-
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)-
dc.relation.ispartofIEEE Access-
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/-
dc.subjectReliabilityen
dc.subjectmulti-coresen
dc.subjectheterogeneityen
dc.subjectfault-toleranceen
dc.subjectAVFen
dc.subjecthardeningen
dc.subjectmicroprocessorsen
dc.subjectsuperscalaren
dc.subjectresilienceen
dc.subjectdesign space explorationen
dc.subjectcheckpointingen
dc.subjectout-of-orderen
dc.subjectarchitectureen
dc.titleArchitectural-Space Exploration of Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processorsen
dc.typeArticleen
dc.typeArtikelde
dc.relation.grantno‘‘Dependable Embedded Systems, SPP 1500-
dc.rights.holderThe Author(s) 2019-
dc.type.categoryResearch Articleen
dc.type.categoryForschungsartikelde
tuw.versionvor-
dcterms.isPartOf.titleIEEE Access-
tuw.publication.orgunitE191 - Institut für Computer Engineering-
tuw.publisher.doi10.1109/ACCESS.2019.2945622-
dc.identifier.libraryidAC15576021-
dc.identifier.urnurn:nbn:at:at-ubtuw:3-8519-
dc.rights.identifierCC BY 4.0-
item.languageiso639-1en-
item.openairetypeArticle-
item.openairetypeArtikel-
item.fulltextwith Fulltext-
item.cerifentitytypePublications-
item.cerifentitytypePublications-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.grantfulltextopen-
Appears in Collections:Article

Files in this item:

Show simple item record

Page view(s)

28
checked on Apr 15, 2021

Download(s)

28
checked on Apr 15, 2021

Google ScholarTM

Check


This item is licensed under a Creative Commons License Creative Commons