TY - GEN
T1 - Cocoa
T2 - 2018 International Symposium on Memory Systems, MEMSYS 2018
AU - Yan, Chao
AU - Joseph, Russell E
PY - 2018/10/1
Y1 - 2018/10/1
N2 - Voltage scaling is one of the most effective techniques to provide power savings on a chip-wide basis. However, reducing supply voltage in the presence of process variation introduces significant reliability challenges for large SRAM arrays. The rapid decrease in SRAM reliability at low voltages determines a minimum operating voltage (Vccmin). This places a floor on the power consumption of the entire processor. Therefore, processors must either stay at high voltages where the low defect densities are tolerable or work at low voltages with existing fault tolerance schemes which sacrifice a large fraction of the cache capacity for improved reliability. In the latter case, processors could obtain significant energy reduction from the aggressive voltage scaling. However, the decrease in the last-level cache (LLC) capacity may have a detrimental impact on both performance and energy consumption. In this work, we propose a novel technique to enable reliable low voltage operation and preserve capacity for LLCs. In this technique, synergistic cache compression and error correction (Cocoa), we apply compression to LLCs to harvest additional capacity to store ECC check bits. In addition, we introduce a new ECC scheme to minimize the space overhead of ECC. In particular, we employ variable strength ECC on a per-segment basis. In the common case, segments with zero or one bit failures are protected by weak ECC. Few segments with multi-bit failures are protected by strong ECC. Our evaluation for a 2MB 8-way set associative L3 cache shows that the proposed schemes can guarantee reliable cache operation beyond 400mV. Compared to a conventional cache with a Vccmin of 810mV, our schemes reduce the system energy per instruction (Core+L3+DRAM) by 54.5%. Furthermore, we achieve 0.8% higher performance even than a fault-free cache when operating at 400mV. We also demonstrate that our energy benefits remain robust across a broad range of cache sizes and supply voltages.
AB - Voltage scaling is one of the most effective techniques to provide power savings on a chip-wide basis. However, reducing supply voltage in the presence of process variation introduces significant reliability challenges for large SRAM arrays. The rapid decrease in SRAM reliability at low voltages determines a minimum operating voltage (Vccmin). This places a floor on the power consumption of the entire processor. Therefore, processors must either stay at high voltages where the low defect densities are tolerable or work at low voltages with existing fault tolerance schemes which sacrifice a large fraction of the cache capacity for improved reliability. In the latter case, processors could obtain significant energy reduction from the aggressive voltage scaling. However, the decrease in the last-level cache (LLC) capacity may have a detrimental impact on both performance and energy consumption. In this work, we propose a novel technique to enable reliable low voltage operation and preserve capacity for LLCs. In this technique, synergistic cache compression and error correction (Cocoa), we apply compression to LLCs to harvest additional capacity to store ECC check bits. In addition, we introduce a new ECC scheme to minimize the space overhead of ECC. In particular, we employ variable strength ECC on a per-segment basis. In the common case, segments with zero or one bit failures are protected by weak ECC. Few segments with multi-bit failures are protected by strong ECC. Our evaluation for a 2MB 8-way set associative L3 cache shows that the proposed schemes can guarantee reliable cache operation beyond 400mV. Compared to a conventional cache with a Vccmin of 810mV, our schemes reduce the system energy per instruction (Core+L3+DRAM) by 54.5%. Furthermore, we achieve 0.8% higher performance even than a fault-free cache when operating at 400mV. We also demonstrate that our energy benefits remain robust across a broad range of cache sizes and supply voltages.
KW - Cache compression
KW - Error correcting code
KW - Fault tolerance caches
KW - Voltage scaling
UR - http://www.scopus.com/inward/record.url?scp=85060977430&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060977430&partnerID=8YFLogxK
U2 - 10.1145/3240302.3240304
DO - 10.1145/3240302.3240304
M3 - Conference contribution
AN - SCOPUS:85060977430
T3 - ACM International Conference Proceeding Series
BT - MEMSYS 2018 - Proceedings of the International Symposium on Memory Systems
PB - Association for Computing Machinery
Y2 - 1 October 2018 through 4 October 2018
ER -