Loading...
Please wait, while we are loading the content...
Similar Documents
Efficient Collective Operations using Remote Memory Operations on VIA-Based Clusters (2003)
| Content Provider | CiteSeerX |
|---|---|
| Author | Gupta, Rinku Balaji, Pavan Dhabaleswar K., P. Nieplocha, Jarek |
| Description | High performance scientific applications require efficient and fast collective communication operations. Most collective communication operations have been built on top of point-to-point send/receive primitives. Modern user-level protocols such as VIA and the emerging InfiniBand architecture support remote DMA operations. These operations not only allow data to be moved between the nodes with low overhead but also allow the user to create and provide a logical shared memory address space across the nodes. This feature demonstrates potential for designing high performance and scalable collective operations. In this paper, we discuss the various design issues that may be the basis of a RDMA supported collective communication library. As a proof of concept, we have designed and implemented the RDMA-based broadcast and the RDMA-based allreduce operations. For RDMA-based broadcast, we get a benefit of 14%, when compared to send/receive-based broadcast for 4KB data size on a 16 node cluster. We also introduce a new reduce algorithm called as the Degree-k tree-based reduce algorithm. Combining the RDMA mechanism with the new reduce algorithm shows a benefit of 38 % for 4 byte messages and 9 % for 4KB messages on a 16 node cluster for the allreduce operation. We also introduce analytical models for broadcast and allreduce to predict the performance of this design for large-scale clusters. These analytical models yield a performance benefit of about 35-40 % for 4 bytes and around 14 % for 4KB messages for 512 and 1024 node clusters for the allreduce operation. 1 In Proceedings of The International Parallel and Distributed Processing Symposium |
| File Format | |
| Language | English |
| Publisher Date | 2003-01-01 |
| Access Restriction | Open |
| Subject Keyword | Collective Communication Library Various Design Issue Scalable Collective Operation Low Overhead Via-based Cluster Receive-based Broadcast Remote Memory Operation Rdma-based Allreduce Operation Node Cluster Analytical Model Point-to-point Send Receive Primitive Rdma Mechanism Collective Communication Operation Byte Message Large-scale Cluster Fast Collective Communication Operation Allreduce Operation Efficient Collective Operation Modern User-level Protocol High Performance Scientific Application Logical Shared Memory Address Space High Performance Performance Benefit Data Size Rdma-based Broadcast |
| Content Type | Text |
| Resource Type | Article |