TY - GEN
T1 - 15.3 A 65nm 3T Dynamic Analog RAM-Based Computing-in-Memory Macro and CNN Accelerator with Retention Enhancement, Adaptive Analog Sparsity and 44TOPS/W System Energy Efficiency
AU - Chen, Zhengyu
AU - Chen, Xi
AU - Gu, Jie
N1 - Funding Information:
This work was supported in part by the National Science Foundation under grant number CCF-1846424.
Publisher Copyright:
© 2021 IEEE.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/2/13
Y1 - 2021/2/13
N2 - Computing-In-Memory (CIM) techniques which incorporate analog computing inside memory macros have shown significant advantages in computing efficiency for deep learning applications. While earlier CIM macros were limited by lower bit precision, e.g. binary weights in [1], recent works have shown 4-to-8b precision for the weights/inputs and up to 20b for the output values [2], [3]. Sparsity and application features have also been exploited at the system level to further improve the computation efficiency [4], [5]. To enable higher precision, bit-wise operations were commonly utilized [3], [4]. However, there are limitations in existing solutions using the bit-wise operations with SRAM cells. Fig. 15.3.1 shows the summary of challenges and solutions in this work. First, all existing solutions utilize 6T/8T/10T SRAM as a CIM cell, which fundamentally limits the size of the CIM array. In this work, we replace the commonly used SRAM cell with a 3-transistor (3T) analog memory cell, referred as dynamic-analog-RAM (DARAM) which represents a 4b weight value as an analog voltage. This leads to \sim 10 \times reduction in transistor count and achieves an effective CIM single-bit area smaller than the foundry-supplied 6T SRAM cell. Secondly, as no bit-wise calculation is needed in this work, only single-phase MAC operations are performed, removing the throughput degradation associated with previous multi-phase approaches and digital accumulation in [3], [4]. Furthermore, analog linearity issues are mitigated by highly linear time-based activation, removal of matching requirements for critical multi-bit caps [4], [6], and a special read current compensation technique. Thirdly, to mitigate the power bottleneck of ADC or SA, this work applies analog sparsity-based low-power methods, which include a compute-adaptive ADC skipping operation when the analog MAC value is small (or 'sparse') and a special weight-shifting technique, leading to an additional \sim 2 \times reduction in CIM-macro power. We demonstrate the proposed techniques using a 65nm CIM-based CNN accelerator showing state-of-art energy efficiency.
AB - Computing-In-Memory (CIM) techniques which incorporate analog computing inside memory macros have shown significant advantages in computing efficiency for deep learning applications. While earlier CIM macros were limited by lower bit precision, e.g. binary weights in [1], recent works have shown 4-to-8b precision for the weights/inputs and up to 20b for the output values [2], [3]. Sparsity and application features have also been exploited at the system level to further improve the computation efficiency [4], [5]. To enable higher precision, bit-wise operations were commonly utilized [3], [4]. However, there are limitations in existing solutions using the bit-wise operations with SRAM cells. Fig. 15.3.1 shows the summary of challenges and solutions in this work. First, all existing solutions utilize 6T/8T/10T SRAM as a CIM cell, which fundamentally limits the size of the CIM array. In this work, we replace the commonly used SRAM cell with a 3-transistor (3T) analog memory cell, referred as dynamic-analog-RAM (DARAM) which represents a 4b weight value as an analog voltage. This leads to \sim 10 \times reduction in transistor count and achieves an effective CIM single-bit area smaller than the foundry-supplied 6T SRAM cell. Secondly, as no bit-wise calculation is needed in this work, only single-phase MAC operations are performed, removing the throughput degradation associated with previous multi-phase approaches and digital accumulation in [3], [4]. Furthermore, analog linearity issues are mitigated by highly linear time-based activation, removal of matching requirements for critical multi-bit caps [4], [6], and a special read current compensation technique. Thirdly, to mitigate the power bottleneck of ADC or SA, this work applies analog sparsity-based low-power methods, which include a compute-adaptive ADC skipping operation when the analog MAC value is small (or 'sparse') and a special weight-shifting technique, leading to an additional \sim 2 \times reduction in CIM-macro power. We demonstrate the proposed techniques using a 65nm CIM-based CNN accelerator showing state-of-art energy efficiency.
UR - http://www.scopus.com/inward/record.url?scp=85102386701&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102386701&partnerID=8YFLogxK
U2 - 10.1109/ISSCC42613.2021.9366045
DO - 10.1109/ISSCC42613.2021.9366045
M3 - Conference contribution
AN - SCOPUS:85102386701
T3 - Digest of Technical Papers - IEEE International Solid-State Circuits Conference
SP - 240
EP - 242
BT - 2021 IEEE International Solid-State Circuits Conference, ISSCC 2021 - Digest of Technical Papers
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Solid-State Circuits Conference, ISSCC 2021
Y2 - 13 February 2021 through 22 February 2021
ER -