TY - GEN
T1 - 19.4 An Adaptive Clock Management Scheme Exploiting Instruction-Based Dynamic Timing Slack for a General-Purpose Graphics Processor Unit with Deep Pipeline and Out-of-Order Execution
AU - Jia, Tianyu
AU - Joseph, Russell E
AU - Gu, Jie
N1 - Funding Information:
This work was supported in part by the National Science Foundation under grant numbers CCF-1116610 and CCF-1618065.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/3/6
Y1 - 2019/3/6
N2 - Cycle-by-cycle dynamic timing slack (DTS), which represents extra timing margin from the critical-path timing slack reported by the static timing analysis (STA), has been observed at both program level and instruction level. Conventional dynamic voltage and frequency scaling (DVFS) works at the program level and does not provide adequate frequency-scaling granularity for instruction-level timing management [1]. Razor-based techniques leverage error detection to exploit the DTS on a cycle-by-cycle basis [2]. However, it requires additional error-detection circuits and architecture-level co-design for error recovery [3]. Supply droop-based adaptive clocking was used to reduce timing margin under PVT variation, but does not address the instruction-level timing variation [4]. Recently, instruction-based adaptive clock schemes have been introduced to enhance a CPU's operation [5-6]. For example, instruction types at the execution stage were used to provide timing control for a simple pipeline structure. However, this scheme lacks adequate consideration for other pipeline stages whose timing may not be opcode dependent [5]. In [6], the instruction-execution sequence was evaluated at the compiler level with the timing encoded into the instruction code. The scheme considers all pipeline stages but relies on in-order execution of instructions for proper timing encoding from the compiler.
AB - Cycle-by-cycle dynamic timing slack (DTS), which represents extra timing margin from the critical-path timing slack reported by the static timing analysis (STA), has been observed at both program level and instruction level. Conventional dynamic voltage and frequency scaling (DVFS) works at the program level and does not provide adequate frequency-scaling granularity for instruction-level timing management [1]. Razor-based techniques leverage error detection to exploit the DTS on a cycle-by-cycle basis [2]. However, it requires additional error-detection circuits and architecture-level co-design for error recovery [3]. Supply droop-based adaptive clocking was used to reduce timing margin under PVT variation, but does not address the instruction-level timing variation [4]. Recently, instruction-based adaptive clock schemes have been introduced to enhance a CPU's operation [5-6]. For example, instruction types at the execution stage were used to provide timing control for a simple pipeline structure. However, this scheme lacks adequate consideration for other pipeline stages whose timing may not be opcode dependent [5]. In [6], the instruction-execution sequence was evaluated at the compiler level with the timing encoded into the instruction code. The scheme considers all pipeline stages but relies on in-order execution of instructions for proper timing encoding from the compiler.
UR - http://www.scopus.com/inward/record.url?scp=85063433545&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063433545&partnerID=8YFLogxK
U2 - 10.1109/ISSCC.2019.8662389
DO - 10.1109/ISSCC.2019.8662389
M3 - Conference contribution
AN - SCOPUS:85063433545
T3 - Digest of Technical Papers - IEEE International Solid-State Circuits Conference
SP - 318
EP - 320
BT - 2019 IEEE International Solid-State Circuits Conference, ISSCC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Solid-State Circuits Conference, ISSCC 2019
Y2 - 17 February 2019 through 21 February 2019
ER -