NDLI: X10 at Petascale

Please wait, while we are loading the content...

X10 at Petascale

Content Provider	Semantic Scholar
Author	Herta, Benjamin Cunningham, Dolora Grove, David Kambadur, Prabhanjan Saraswat, Vijay A. Shinnar, Avraham Takeuchi, Mikio Vaziri, Mandana
Copyright Year	2013
Abstract	X10 is a high-performance, high-productivity programming language aimed at large-scale distributed and shared-memory parallel applications. It is based on the Asynchronous Partitioned Global Address Space (APGAS) programming model, supporting the same fine-grained concurrency mechanisms within and across nodes. We demonstrate that X10 delivers solid performance at petascale by running (weak scaling) eight application kernels on an IBM Power 775 supercomputer utilizing up to 55680 Power7 cores (1.7 Pflop/s). We sketch advances in distributed termination detection, distributed load balancing, and use of high-performance interconnects that enable X10 to scale out to thousands of nodes. 1. OVERVIEW X10 is a high-performance, high-productivity programming language developed at IBM. It is a class-based, strongly typed, garbage-collected, object-oriented language [8, 7]. To support concurrency and distribution, X10 uses the Asynchronous Partitioned Global Address Space programming model (APGAS [6]). This model introduces two key concepts – places and asynchronous tasks – and a few mechanisms for coordination. With these, APGAS can express both regular and irregular parallelism, message-passing-style and active-message-style computations, fork-join and bulksynchronous parallelism. In contrast to hybrid models like MPI+OpenMP, the same constructs underpin both intraand inter-place concurrency. We present experimental results for eight kernels. We implement the four HPC Class 2 Challenge benchmarks: HPL, FFT, RandomAccess, and Stream Triad [3], as well as SmithWaterman [10], Betweenness Centrality [1], K-Means [4], and Unbalanced Tree Search (UTS) [5]. We run them on The X10 tool chain and the benchmark codes are publicly available at http://x10-lang.org. a large Power 775 system with a theoretical peak performance of 1.7 Pflop/s. For the HPC Challenge benchmarks, X10 today achieves 41% to 87% of the system’s potential at scale as reported by IBM’s optimized runs entry to the HPC Class 1 Challenge in Nov. 2012 [2]. To the best of our knowledge, our UTS implementation is the first to scale linearly to petaflop systems for geometric trees. Our K-Means and Smith-Waterman codes also scale linearly. Although still statically load-balanced, our Betweenness Centrality code can process 245 Billion edges per second using 47040 cores. These results have been made possible with our solutions to the distributed termination detection problem at scale, the effective use of high-performance interconnects, and a refined distributed load balancing scheme for UTS. 2. OPTIMIZING FOR PETASCALE The X10 programming model relies heavily on termination detection. Detection scopes can be nested, distributed, and collocated. The reference implementation has to handle arbitrary distributed task graphs and cope with arbitrary network latencies. It does not scale. We identify a series of common patterns of distributed termination detection and provide specialized scalable implementations for these, relying on a combination of static analysis, dynamic analysis, and user input (pragmas) to guide selection. Supercomputers such as Power 775 are built upon highperformance interconnects and provide network acceleration mechanisms such as collectives and RDMAs. We augment X10’s communication APIs to expose these primitives when available and emulate them when not (e.g., over ethernet). The UTS benchmark measures the rate of traversal of a tree generated on the fly using a splittable random number generator. The tree is highly irregular and unpredictable. Dynamic distributed load balancing is therefore indispensable. We start from the state-of-the-art lifeline-based load balancer described in [9], which only scales well to hundreds of Power 775 nodes, and make it scale to thousands of nodes. The reference algorithm cleverly combines random workstealing with organized work-sharing to efficiently distribute work dynamically. We improve the work queue implementation by compactly representing intervals of sibling nodes. We reduce termination detection overheads by separating work-stealing and work-sharing termination scopes. We optimize routes by taming the random victim selection and exploiting indirect routes in termination detection. 589231' 22.38'(1'core)'
File Format	PDF HTM / HTML
Alternate Webpage(s)	http://sc13.supercomputing.org/sites/default/files/PostersArchive/tech_posters/post158s2-file3.pdf
Language	English
Access Restriction	Open
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in