Abstract
Many scientific data sets contain temporal dimensions. These are the data storing information at the same spatial location but different time stamps. Some of the biggest temporal datasets are produced by parallel computing applications such as simulations of climate change and fluid dynamics. Temporal datasets can be very large and cost a huge amount of time to transfer among storage locations. Using data compression techniques, files can be transferred faster and save storage space. NUMARCK is a lossy data compression algorithm for temporal data sets that can learn emerging distributions of element-wise change ratios along the temporal dimension and encodes them into an index table to be concisely represented. This paper presents a parallel implementation of NUMARCK. Evaluated with six data sets obtained from climate and astrophysics simulations, parallel NUMARCK achieved scalable speedups of up to 8788 when running 12800 MPI processes on a parallel computer. We also compare the compression ratios against two lossy data compression algorithms, ISABELA and ZFP. The results show that NUMARCK achieved higher compression ratio than ISABELA and ZFP.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 23rd IEEE International Conference on High Performance Computing, HiPC 2016 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 62-71 |
Number of pages | 10 |
ISBN (Electronic) | 9781509054114 |
DOIs | |
State | Published - Feb 1 2017 |
Event | 23rd IEEE International Conference on High Performance Computing, HiPC 2016 - Hyderabad, India Duration: Dec 19 2016 → Dec 22 2016 |
Publication series
Name | Proceedings - 23rd IEEE International Conference on High Performance Computing, HiPC 2016 |
---|
Other
Other | 23rd IEEE International Conference on High Performance Computing, HiPC 2016 |
---|---|
Country/Territory | India |
City | Hyderabad |
Period | 12/19/16 → 12/22/16 |
Funding
This work is supported in part by the following grants: NSF awards CCF-1029166, IIS-1343639, CCF-1409601; DOE awards DE-SC0007456, DE-SC0014330; AFOSR award FA9550-12-1-0458; NIST award 70NANB14H012; DARPA award N66001-15-C-4036. C.F. acknowledges funding provided by the Australian Research Council's Discovery Projects (grants DP130102078 and DP150104329). C.F. further acknowledges supercomputing time provided by the Jülich Supercomputing Centre (grant hhd20), the Leibniz Rechenzentrum and the Gauss Centre for Supercomputing (grants pr32lo, pr48pi and GCS Large-scale project 10391), the Partnership for Advanced Computing in Europe (PRACE grant pr89mu), the Australian National Computational Infrastructure (grant ek9), and the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia. The simulation software FLASH was in part developed by the DOE-supported Flash Center for Computational Science at the University of Chicago.
Keywords
- error-bound
- lossy data compression
- parallel data compression
- temporal change ratio
ASJC Scopus subject areas
- Hardware and Architecture