Loading...
Please wait, while we are loading the content...
Similar Documents
Overcoming extreme-scale reproducibility challenges through a unified, targeted, and multilevel toolset
| Content Provider | ACM Digital Library |
|---|---|
| Author | Ahn, Dong H. Rakamarić, Zvonimir Schulz, Martin Gopalakrishnan, Ganesh Lee, Gregory L. Laguna, Ignacio |
| Abstract | Reproducibility, the ability to repeat program executions with the same numerical result or code behavior, is crucial for computational science and engineering applications. However, non-determinism in concurrency scheduling often hampers achieving this ability on high performance computing (HPC) systems. To aid in managing the adverse effects of non-determinism, prior work has provided techniques to achieve bit-precise reproducibility, but most of them focus only on small-scale parallelism. While scalable techniques recently emerged, they are disparate and target special purposes, e.g., single-schedule domains. On current systems with $O(10^{6})$ compute cores and future ones with $O(10^{9}),$ any technique that does not embrace a unified, targeted, and multilevel approach will fall short of providing reproducibility. In this paper, we argue for a common toolset that embodies this approach, where programmers select and compose complementary tools and can effectively, yet scalably, analyze, control, and eliminate sources of non-determinism at scale. This allows users to gain reproducibility only to the levels demanded by specific code development needs. We present our research agenda and ongoing work toward this goal. |
| Starting Page | 41 |
| Ending Page | 44 |
| Page Count | 4 |
| File Format | |
| ISBN | 9781450324991 |
| DOI | 10.1145/2532352.2532357 |
| Language | English |
| Publisher | Association for Computing Machinery (ACM) |
| Publisher Date | 2013-11-17 |
| Publisher Place | New York |
| Access Restriction | Subscribed |
| Content Type | Text |
| Resource Type | Article |