Thermal Management for FPGA Nodes in HPC Systems

Yingyi Luo, Joshua C. Zhao, Arnav Aggarwal, Seda Ogrenci-Memik, Kazutomo Yoshii

Research output: Contribution to journalArticlepeer-review


The integration of FPGAs into large-scale computing systems is gaining attention. In these systems, real-time data handling for networking, tasks for scientific computing, and machine learning can be executed with customized datapaths on reconfigurable fabric within heterogeneous compute nodes. At the same time, thermal management, particularly battling the cooling cost and guaranteeing the reliability, is a continuing concern. The introduction of new heterogeneous components into HPC nodes only adds further complexities to thermal modeling and management. The thermal behavior of multi-FPGA systems deployed within large compute clusters is less explored. In this article, we first show that the thermal behaviors of different FPGAs of the same generation can vary due to their physical locations in a rack and process variation, even though they are running the same tasks. We present a machine learning-based model to capture the thermal behavior of each individual FPGA in the cluster. We then propose two thermal management strategies guided by our thermal model. First, we mitigate thermal variation and hotspots across the cluster by proactive thermal-aware task placement. Under the tested system and benchmarks, we achieve up to 26.4° C and on average 13.3° C system temperature reduction with no performance penalty. Second, we utilize this thermal model to guide HLS parameter tuning at the task design stage to achieve improved thermal response after deployment.

Original languageEnglish (US)
Article number3423494
JournalACM Transactions on Design Automation of Electronic Systems
Issue number2
StatePublished - Oct 2020


  • Thermal modeling
  • high performance computing
  • task placement
  • thermal-aware design

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design
  • Electrical and Electronic Engineering


Dive into the research topics of 'Thermal Management for FPGA Nodes in HPC Systems'. Together they form a unique fingerprint.

Cite this