Abstract
This article presents an adaptive clock scheme to exploit instruction-based dynamic timing slack (DTS) for a general-purpose graphics processor unit (GPGPU) architecture. Based on the developed transitional static timing analysis, the deterministic DTS can be identified for each instruction at different pipeline stages. A critical path (CP) messenger scheme was designed to monitor the runtime utilization of CPs. Both real-Time issued instruction information and CP messengers are utilized to determine the runtime DTS margin and guide the cycle-by-cycle clock period adjustment. To apply the proposed adaptive clock on GPGPU, a hierarchical clocking scheme is built including a global phase-locked loop (PLL) and local delay-locked loop (DLL)-based clock generator inside each compute unit (CU). Each CU core contains its own clock domain with adjustable local clocking. In addition, to exploit error-resilient characteristics of the neural network, an elastic pipeline clocking scheme is developed to redistribute the timing margin across pipeline stages for machine learning computations. Measurement results from the implemented open-source GPGPU architecture on a 65 nm CMOS process demonstrate up to 18% performance improvement or equivalent 30% energy saving can be obtained by exploiting the deterministic instruction-based DTS. The proposed elastic pipeline clocking can gain an additional 8% energy saving with small accuracy degradation for neural network inference operations.
Original language | English (US) |
---|---|
Article number | 9044316 |
Pages (from-to) | 2259-2269 |
Number of pages | 11 |
Journal | IEEE Journal of Solid-State Circuits |
Volume | 55 |
Issue number | 8 |
DOIs | |
State | Published - Aug 2020 |
Funding
Manuscript received October 13, 2019; revised February 1, 2020; accepted February 28, 2020. Date of publication March 23, 2020; date of current version July 23, 2020. This article was approved by Associate Editor Edith Beigné. This work in this article was partly supported by the National Science Foundation, the grant number is CCF-1618065. (Corresponding author: Tianyu Jia.) The authors are with the Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208 USA (e-mail: [email protected]).
Keywords
- Adaptive clocking
- delay-locked loop (DLL)
- dynamic timing slack (DTS)
- general-purpose graphics processor unit (GPGPU)
- hierarchical clocking architecture
- neural network resiliency
- phase-locked loop (PLL)
ASJC Scopus subject areas
- Electrical and Electronic Engineering