Multi-bit error tolerant caches using two-dimensional error coding

Jangwoo Kim*, Nikos Hardavellas, Ken Mai, Babak Falsafi, James C. Hoe

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

186 Citations (Scopus)

Abstract

In deep sub-micron ICs, growing amounts of ondie memory and scaling effects make embedded memories increasingly vulnerable to reliability and yield problems. As scaling progresses, soft and hard errors in the memory system will increase and single error events are more likely to cause large-scale multi-bit errors. However, conventional memory protection techniques can neither detect nor correct large-scale multi-bit errors without incurring large performance, area, and power overheads. We propose two-dimensional (2D) error coding in embedded memories, a scalable multi-bit error protection technique to improve memory reliability and yield. The key innovation is the use of vertical error coding across words that is used only for error correction in combination with conventional per-word horizontal error coding. We evaluate this scheme in the cache hierarchies of two representative chip multiprocessor designs and show that 2D error coding can correct clustered errors up to 32x32 bits with significantly smaller performance, area, and power overheads than conventional techniques.

Original languageEnglish (US)
Title of host publicationProceedings of the The 40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007
Pages197-209
Number of pages13
DOIs
StatePublished - Dec 1 2007
Event40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007 - Chicago, IL, United States
Duration: Dec 1 2007Dec 5 2007

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
ISSN (Print)1072-4451

Other

Other40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007
CountryUnited States
CityChicago, IL
Period12/1/0712/5/07

Fingerprint

Coding errors
Data storage equipment
Error correction
Innovation

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Kim, J., Hardavellas, N., Mai, K., Falsafi, B., & Hoe, J. C. (2007). Multi-bit error tolerant caches using two-dimensional error coding. In Proceedings of the The 40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007 (pp. 197-209). [4408256] (Proceedings of the Annual International Symposium on Microarchitecture, MICRO). https://doi.org/10.1109/MICRO.2007.19
Kim, Jangwoo ; Hardavellas, Nikos ; Mai, Ken ; Falsafi, Babak ; Hoe, James C. / Multi-bit error tolerant caches using two-dimensional error coding. Proceedings of the The 40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007. 2007. pp. 197-209 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO).
@inproceedings{a7c913e00d314d18af3aaef2633f77c3,
title = "Multi-bit error tolerant caches using two-dimensional error coding",
abstract = "In deep sub-micron ICs, growing amounts of ondie memory and scaling effects make embedded memories increasingly vulnerable to reliability and yield problems. As scaling progresses, soft and hard errors in the memory system will increase and single error events are more likely to cause large-scale multi-bit errors. However, conventional memory protection techniques can neither detect nor correct large-scale multi-bit errors without incurring large performance, area, and power overheads. We propose two-dimensional (2D) error coding in embedded memories, a scalable multi-bit error protection technique to improve memory reliability and yield. The key innovation is the use of vertical error coding across words that is used only for error correction in combination with conventional per-word horizontal error coding. We evaluate this scheme in the cache hierarchies of two representative chip multiprocessor designs and show that 2D error coding can correct clustered errors up to 32x32 bits with significantly smaller performance, area, and power overheads than conventional techniques.",
author = "Jangwoo Kim and Nikos Hardavellas and Ken Mai and Babak Falsafi and Hoe, {James C.}",
year = "2007",
month = "12",
day = "1",
doi = "10.1109/MICRO.2007.19",
language = "English (US)",
isbn = "0769530478",
series = "Proceedings of the Annual International Symposium on Microarchitecture, MICRO",
pages = "197--209",
booktitle = "Proceedings of the The 40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007",

}

Kim, J, Hardavellas, N, Mai, K, Falsafi, B & Hoe, JC 2007, Multi-bit error tolerant caches using two-dimensional error coding. in Proceedings of the The 40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007., 4408256, Proceedings of the Annual International Symposium on Microarchitecture, MICRO, pp. 197-209, 40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007, Chicago, IL, United States, 12/1/07. https://doi.org/10.1109/MICRO.2007.19

Multi-bit error tolerant caches using two-dimensional error coding. / Kim, Jangwoo; Hardavellas, Nikos; Mai, Ken; Falsafi, Babak; Hoe, James C.

Proceedings of the The 40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007. 2007. p. 197-209 4408256 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Multi-bit error tolerant caches using two-dimensional error coding

AU - Kim, Jangwoo

AU - Hardavellas, Nikos

AU - Mai, Ken

AU - Falsafi, Babak

AU - Hoe, James C.

PY - 2007/12/1

Y1 - 2007/12/1

N2 - In deep sub-micron ICs, growing amounts of ondie memory and scaling effects make embedded memories increasingly vulnerable to reliability and yield problems. As scaling progresses, soft and hard errors in the memory system will increase and single error events are more likely to cause large-scale multi-bit errors. However, conventional memory protection techniques can neither detect nor correct large-scale multi-bit errors without incurring large performance, area, and power overheads. We propose two-dimensional (2D) error coding in embedded memories, a scalable multi-bit error protection technique to improve memory reliability and yield. The key innovation is the use of vertical error coding across words that is used only for error correction in combination with conventional per-word horizontal error coding. We evaluate this scheme in the cache hierarchies of two representative chip multiprocessor designs and show that 2D error coding can correct clustered errors up to 32x32 bits with significantly smaller performance, area, and power overheads than conventional techniques.

AB - In deep sub-micron ICs, growing amounts of ondie memory and scaling effects make embedded memories increasingly vulnerable to reliability and yield problems. As scaling progresses, soft and hard errors in the memory system will increase and single error events are more likely to cause large-scale multi-bit errors. However, conventional memory protection techniques can neither detect nor correct large-scale multi-bit errors without incurring large performance, area, and power overheads. We propose two-dimensional (2D) error coding in embedded memories, a scalable multi-bit error protection technique to improve memory reliability and yield. The key innovation is the use of vertical error coding across words that is used only for error correction in combination with conventional per-word horizontal error coding. We evaluate this scheme in the cache hierarchies of two representative chip multiprocessor designs and show that 2D error coding can correct clustered errors up to 32x32 bits with significantly smaller performance, area, and power overheads than conventional techniques.

UR - http://www.scopus.com/inward/record.url?scp=47349100793&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=47349100793&partnerID=8YFLogxK

U2 - 10.1109/MICRO.2007.19

DO - 10.1109/MICRO.2007.19

M3 - Conference contribution

AN - SCOPUS:47349100793

SN - 0769530478

SN - 9780769530475

T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

SP - 197

EP - 209

BT - Proceedings of the The 40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007

ER -

Kim J, Hardavellas N, Mai K, Falsafi B, Hoe JC. Multi-bit error tolerant caches using two-dimensional error coding. In Proceedings of the The 40th IEEE/ACM International Symposium on Microarchitecture, MICRO 2007. 2007. p. 197-209. 4408256. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO). https://doi.org/10.1109/MICRO.2007.19