Abstract
This article presents a deep neural network (DNN) accelerator using an adaptive clocking technique (i.e., elastic clock chain) to exploit the dynamic timing margin for the 2-D processing element (PE) array-based DNN accelerator. To address two major challenges on exploiting dynamic timing margin for modern deep learning accelerators (i.e., diminishing dynamic timing margin on a large array and strong timing dependence on runtime operands), in this work, we proposed an elastic clock chain scheme to provide a flexible multi-domain clock management scheme for in situ compute adaptability. More specifically, a total of 16 clock domains have been created for the 2-D PE array with the clock periods dynamically adjusted based on both runtime instructions and operands. The multi-domain clock sources are generated from a multi-phase delay-locked loop (DLL) and delivered by a global clock bus. The clock offsets between neighboring domains are deliberately managed to maintain the synchronization among clock domains. A 16 × 8 PE array that supports different DNN dataflows and bit-precisions was fabricated using a 65-nm CMOS process. The measurement results on MNIST and CIFAR-10 data sets showed that the effective operating frequency was improved by up to 19% for a single instruction multiple data (SIMD) data flow by enabling the operation of the proposed elastic clock chain. The performance improvement was converted into up to 34% energy saving. Compared with SIMD data flow, the systolic dataflow shows reduced performance improvement of up to 11% due to the consideration of all in-flight operand values.
Original language | English (US) |
---|---|
Article number | 9222216 |
Pages (from-to) | 55-65 |
Number of pages | 11 |
Journal | IEEE Journal of Solid-State Circuits |
Volume | 56 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2021 |
Funding
Manuscript received April 25, 2020; revised July 8, 2020 and September 2, 2020; accepted September 27, 2020. Date of publication October 13, 2020; date of current version December 24, 2020. This article was approved by Guest Editor Ping-Hsuan Hsieh. This work was supported in part by the National Science Foundation under Grant CCF-1618065. (Corresponding author: Tianyu Jia.) Tianyu Jia was with the Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208 USA. He is now with the School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138 USA (e-mail: [email protected]).
Keywords
- Adaptive clocking
- deep neural network (DNN) accelerator
- dynamic timing margin
- multiple clock domains
- processing element (PE)
- systolic array
ASJC Scopus subject areas
- Electrical and Electronic Engineering