A flexible I/O arbitration framework for netCDF-based big data processing workflows on high-end supercomputers

Jianwei Liao*, Balazs Gerofi, Guo Yuan Lien, Takemasa Miyoshi, Seiya Nishizawa, Hirofumi Tomita, Wei Keng Liao, Alok Choudhary, Yutaka Ishikawa

*Corresponding author for this work

Research output: Contribution to journalArticle

6 Scopus citations

Abstract

On the verge of the convergence between high-performance computing and Big Data processing, it has become increasingly prevalent to deploy large-scale data analytics workloads on high-end supercomputers. Such applications often come in the form of complex workflows with various different components, assimilating data from scientific simulations as well as from measurements streamed from sensor networks, such as radars and satellites. For example, as part of the Flagship 2020 (post-K) supercomputer project of Japan, RIKEN is investigating the feasibility of a highly accurate weather forecasting system that would provide a real-time outlook for severe guerrilla rainstorms. One of the main performance bottlenecks of this application is the lack of efficient communication among workflow components, which currently takes place over the parallel file system.In this paper, we present an initial study of a direct communication framework designed for complex workflows that eliminates unnecessary file I/O among components. Specifically, we propose an I/O arbitration layer that provides direct parallel data transfer (both synchronous and asynchronous) among job components that rely on the netCDF interface for performing I/O operations. Our solution requires only minimal modifications to application code. Moreover, we propose a configuration file–based approach that allows users to specify the desired data transfer pattern among workflow components, offering a general solution for different application contexts. We present a preliminary evaluation of the proposed framework on the K Computer (running on up to 4800 compute nodes) using RIKEN's experimental weather forecasting workflow as a case study.

Original languageEnglish (US)
Article numbere4161
JournalConcurrency Computation
Volume29
Issue number15
DOIs
StatePublished - Aug 10 2017

Keywords

  • asynchronous transfer
  • big data processing
  • customizability
  • netCDF
  • parallel direct data transfer
  • real time

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Computer Science Applications
  • Computer Networks and Communications
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'A flexible I/O arbitration framework for netCDF-based big data processing workflows on high-end supercomputers'. Together they form a unique fingerprint.

  • Cite this