Loading...
Please wait, while we are loading the content...
Similar Documents
Formal Analysis of MPI-Based Parallel Programs : Present and Future ∗
| Content Provider | Semantic Scholar |
|---|---|
| Author | Gopalakrishnan, Ganesh Kirby, Robert Michael Siegel, Stephen F. Thakur, Rajeev Gropp, William Lusk, E. Supinski, Bronis R. De Schulz, Martin Bronevetsky, Greg |
| Copyright Year | 2011 |
| Abstract | The Message Passing Interface (MPI) is the dominant programming model for high-performance computing. Applications that use over 100,000 cores are currently running on large HPC systems. Unfortunately, MPI usage presents a number of correctness challenges that can result in wasted simulation cycles and errant computations. In this article, we describe recently developed formal and semi-formal approaches for verifying MPI applications, highlighting the scalability/coverage tradeoffs taken in them. We concentrate on MPI because of its ubiquity and because similar issues will arise when developing verification techniques for other performance-oriented concurrency APIs. We conclude with a look at the inevitability of multiple concurrency models in future parallel systems and point out why the HPC community urgently needs increased participation from formal methods researchers. 1 MPI and Importance of Correctness The Message Passing Interface (MPI) Standard was originally developed around 1993 by the MPI Forum, a group of vendors, parallel programming researchers, and computational scientists. The document is not issued by an official standards organization, but has become a de facto standard through near universal adoption. Development of MPI continues, with MPI-2.2 released in 2009. The standard has been published on the web [1] and as a book, and several other books based on it have appeared (e.g., [2, 3]). Implementations are available in open source form [4, 5], from software vendors such as Microsoft and Intel, and from every vendor of high-performance computing systems. MPI is widely cited; Google scholar returned 28,400 hits for “+MPI +"Message Passing Interface"” in September, 2010. MPI is ubiquitous: it runs everywhere, from embedded systems to the largest of supercomputers. Many MPI programs represent dozens, if not hundreds, of person-years of development, including their calibration for accuracy, and performance tuning. MPI was designed to support highly scalable comput∗Supported in part by Microsoft, NSF CNS-0509379, CCF-0811429, CCF-0903408, CCF-0953210, CCF-0733035, and DOE ASCR DE-AC0206CH11357. Accepted 6/1/11 to appear in CACM. ing; applications that use over 100,000 cores are currently running on systems such as the IBM Blue Gene/P (Figure 1) and Cray XT5. Scientists and engineers all over the world use MPI in thousands of applications, such as investigations of alternate energy sources and weather simulations. For highperformance computing, MPI is by far the dominant programming model; most (at some centers, all) applications running on supercomputers use MPI. MPI is considered a requirement even for Exascale systems, at least by application developers [6]. Unfortunately, the MPI debugging methods available to these application developers are wasteful, and ultimately unreliable. Existing MPI testing tools seldom provide coverage guarantees. They examine essentially equivalent execution sequences, thus reducing testing efficiency. These methods fare even worse at large problem scales. Consider the costs of HPC bugs: (i) a high-end HPC center costs hundreds of millions to commission, and the machines become obsolete within six years, (ii) in many of these centers, over $3 million dollars are spent in electricity costs alone each year, and (iii) research teams apply for computer time through competitive proposals, spending years planning their experiments. In addition to these development costs, one must add the costs to society of relying on potentially defective software to inform decisions on issues of great importance, such as climate change. Because the stakes are so high, formal methods can play an important role in debugging and verifying MPI applications. In this paper, we give a survey of existing techniques, clearly describing the pros and cons of each approach. These methods have value far beyond MPI, addressing the general needs of future concurrency application developers, who will inevitably use a variety of low-level concurrency APIs. Overview of MPI: Historically, parallel systems have used either message passing or shared memory for communication. Compared to other message passing systems noted for their parsimony, MPI supports a large number of cohesively engineered features essential for designing large-scale simulations. MPI-2.2 [1] specifies over 300 functions. Most developers use only a few dozen of these in any given application. MPI programs consist of one or more threads of execution |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www2.cs.utah.edu/~kirby/Publications/Kirby-60.pdf |
| Alternate Webpage(s) | http://www.sci.utah.edu/publications/gopalakrishnan11/Gopalakrishnan_ACM2011.pdf |
| Alternate Webpage(s) | http://www.cs.utah.edu/~kirby/Publications/Kirby-60.pdf |
| Alternate Webpage(s) | http://formalverification.cs.utah.edu/pdf/cacm-hpc-fv.pdf |
| Alternate Webpage(s) | https://www.cs.utah.edu/~kirby/Publications/Kirby-60.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |