TY - GEN
T1 - NCPU
T2 - 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020
AU - Jia, Tianyu
AU - Ju, Yuhao
AU - Joseph, Russ
AU - Gu, Jie
N1 - Funding Information:
ACKNOWLEDGMENT We wish to thank the anonymous reviewers for their helpful feedback. We thank Kendall Kuzminskas for her contribution on preparing some testing programs. This paper is supported in part by the National Science Foundation under grant number CCF-1908488.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/10
Y1 - 2020/10
N2 - Machine learning inference has become an essential task for embedded edge devices requiring the deployment of costly deep neural network accelerators onto extremely resource-constrained hardware. Although many optimization strategies have been proposed to improve the efficiency of standalone accelerators, the optimization for end-to-end performance of a computing device with heterogeneous cores is still challenging and often overlooked, especially for low power devices. In this paper, we propose a unified reconfigurable architecture, referred as Neural CPU (NCPU), for low-cost embedded systems. The proposed architecture is built on a binary neural network accelerator with the capability to emulate an in-order RISC-V CPU pipeline. The NCPU supports flexible programmability of RISC-V and maintains data locally to avoid costly core-to-core data transfer. A two-core NCPU SoC is designed and fabricated in a 65nm CMOS process. Compared with the conventional heterogeneous architecture, a single NCPU achieves 35% area reduction and 12% energy saving at 0.4V, which is suitable for low power and low-cost embedded edge devices. The NCPU design also features the capability of smooth switching between general-purpose CPU operation and a binary neural network inference to realize full utilization of the cores. The implemented two-core NCPU SoC achieves an end-to-end performance speed-up of 43% or an equivalent 74% energy saving based on use cases of real-time image classification and motion detection.
AB - Machine learning inference has become an essential task for embedded edge devices requiring the deployment of costly deep neural network accelerators onto extremely resource-constrained hardware. Although many optimization strategies have been proposed to improve the efficiency of standalone accelerators, the optimization for end-to-end performance of a computing device with heterogeneous cores is still challenging and often overlooked, especially for low power devices. In this paper, we propose a unified reconfigurable architecture, referred as Neural CPU (NCPU), for low-cost embedded systems. The proposed architecture is built on a binary neural network accelerator with the capability to emulate an in-order RISC-V CPU pipeline. The NCPU supports flexible programmability of RISC-V and maintains data locally to avoid costly core-to-core data transfer. A two-core NCPU SoC is designed and fabricated in a 65nm CMOS process. Compared with the conventional heterogeneous architecture, a single NCPU achieves 35% area reduction and 12% energy saving at 0.4V, which is suitable for low power and low-cost embedded edge devices. The NCPU design also features the capability of smooth switching between general-purpose CPU operation and a binary neural network inference to realize full utilization of the cores. The implemented two-core NCPU SoC achieves an end-to-end performance speed-up of 43% or an equivalent 74% energy saving based on use cases of real-time image classification and motion detection.
KW - Binary neural network
KW - Embedded systems
KW - End-to-end performance
KW - RISC-V
KW - Reconfigurable architecture
KW - SoC silicon validation
KW - Ultra-low power device
UR - http://www.scopus.com/inward/record.url?scp=85097338395&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097338395&partnerID=8YFLogxK
U2 - 10.1109/MICRO50266.2020.00091
DO - 10.1109/MICRO50266.2020.00091
M3 - Conference contribution
AN - SCOPUS:85097338395
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 1097
EP - 1109
BT - Proceedings - 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020
PB - IEEE Computer Society
Y2 - 17 October 2020 through 21 October 2020
ER -