An adaptive clock scheme exploiting instruction-based dynamic timing slack for a gpgpu architecture

Tianyu Jia*, Yijie Wei, Russ Joseph, Jie Gu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

This article presents an adaptive clock scheme to exploit instruction-based dynamic timing slack (DTS) for a general-purpose graphics processor unit (GPGPU) architecture. Based on the developed transitional static timing analysis, the deterministic DTS can be identified for each instruction at different pipeline stages. A critical path (CP) messenger scheme was designed to monitor the runtime utilization of CPs. Both real-Time issued instruction information and CP messengers are utilized to determine the runtime DTS margin and guide the cycle-by-cycle clock period adjustment. To apply the proposed adaptive clock on GPGPU, a hierarchical clocking scheme is built including a global phase-locked loop (PLL) and local delay-locked loop (DLL)-based clock generator inside each compute unit (CU). Each CU core contains its own clock domain with adjustable local clocking. In addition, to exploit error-resilient characteristics of the neural network, an elastic pipeline clocking scheme is developed to redistribute the timing margin across pipeline stages for machine learning computations. Measurement results from the implemented open-source GPGPU architecture on a 65 nm CMOS process demonstrate up to 18% performance improvement or equivalent 30% energy saving can be obtained by exploiting the deterministic instruction-based DTS. The proposed elastic pipeline clocking can gain an additional 8% energy saving with small accuracy degradation for neural network inference operations.

Original languageEnglish (US)
Article number9044316
Pages (from-to)2259-2269
Number of pages11
JournalIEEE Journal of Solid-State Circuits
Volume55
Issue number8
DOIs
StatePublished - Aug 2020

Funding

Manuscript received October 13, 2019; revised February 1, 2020; accepted February 28, 2020. Date of publication March 23, 2020; date of current version July 23, 2020. This article was approved by Associate Editor Edith Beigné. This work in this article was partly supported by the National Science Foundation, the grant number is CCF-1618065. (Corresponding author: Tianyu Jia.) The authors are with the Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208 USA (e-mail: [email protected]).

Keywords

  • Adaptive clocking
  • delay-locked loop (DLL)
  • dynamic timing slack (DTS)
  • general-purpose graphics processor unit (GPGPU)
  • hierarchical clocking architecture
  • neural network resiliency
  • phase-locked loop (PLL)

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'An adaptive clock scheme exploiting instruction-based dynamic timing slack for a gpgpu architecture'. Together they form a unique fingerprint.

Cite this