Spying on the Floating Point Behavior of Existing, Unmodified Scientific Applications

Peter Dinda, Alex Bernat, Conor Hetland

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Scientific (and other) applications are critically dependent on calculations done using IEEE floating point arithmetic. A number of concerns have been raised about correctness in such applications given the numerous gotchas the IEEE standard presents for developers, as well as the complexity of its implementation at the hardware and compiler levels. The standard and its implementations do provide mechanisms for analyzing floating point arithmetic as it executes, making it possible to find and track problematic operations. However, this capability is seldom used in practice. In response, we have developed FPSpy, a tool that provides this capability when operating underneath existing, unmodified x64 application binaries on Linux, including those using thread- and process-level parallelism. FPSpy can observe application behavior without any cooperation from the application or developer, and can potentially be deployed as part of a job launch process. We present the design, implementation, and performance evaluation of FPSpy. FPSpy operates conservatively, getting out of the way if the application itself begins to use any of the OS or hardware features that FPSpy depends on. Its overhead can be throttled, allowing a tradeoff between which and how many unusual events are to be captured, and the slowdown incurred by the application, with the low point providing virtually zero slowdown. We evaluated FPSpy by using it to methodically study seven widely-used applications/frameworks from a range of domains (five of which are in the NSF XSEDE top-20), as well as the NAS and PARSEC benchmark suites. All told, these comprise about 7.5 million lines of source code in a wide range of languages, and parallelism models (including OpenMP and MPI). FPSpy was able to produce trace information for all of them. The traces show that problematic floating point events occur in both the applications and the benchmarks. Analysis of the rounding behavior captured in our traces also suggests the feasibility of an approach to adding adaptive precision underneath existing, unmodified binaries.

Original languageEnglish (US)
Title of host publicationHPDC 2020 - Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing
PublisherAssociation for Computing Machinery, Inc
Pages5-16
Number of pages12
ISBN (Electronic)9781450370523
DOIs
StatePublished - Jun 23 2020
Event29th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2020 - Stockholm, Sweden
Duration: Jun 23 2020Jun 26 2020

Publication series

NameHPDC 2020 - Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing

Conference

Conference29th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2020
CountrySweden
CityStockholm
Period6/23/206/26/20

Keywords

  • floating point arithmetic
  • ieee 754
  • software development

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint Dive into the research topics of 'Spying on the Floating Point Behavior of Existing, Unmodified Scientific Applications'. Together they form a unique fingerprint.

Cite this