Loading...
Please wait, while we are loading the content...
Similar Documents
A High Performance Version of Parallel Lapack: Preliminary Report
| Content Provider | Semantic Scholar |
|---|---|
| Author | Strazdins, Peter. |
| Copyright Year | 1996 |
| Abstract | Dense linear algebra computations require the technique of`block-partitioned algorithms' for their eecient implementation on memory-hierarchy multi-processors. Most existing studies and libraries for this purpose, for example ScaLAPACK, assume that the block or panel width ! for these algorithms must be the same as the matrix distribution block size r. We present a project in progress to extend ScaLA-PACK using thèdistributed panels' technique, ie. to allow ! > r, which has the twofold advantages of improving performance for memory-hierarchy multiprocessors and yielding a simpli-ed user interface. A key element of the project is a general Distributed BLAS implementation, which has been developed for primarily the Fu-jitsu AP series of multiprocessors but is now fully portable. Other key components are versions of the BLACS and BLACS libraries to achieve high performance cell computation and communication respectively on the required target multi-processor architectures. Preliminary experiences and results using the Fujitsu AP1000 multiprocessor indicate that good performance improvements are possible for relatively little eeort. Performance models indicate similar improvements can be expected on multiprocessors with relatively low communication costs and large (second-level) caches. Future work in the project include improving the DBLAS to 'cache' previously communicated data and the porting and testing of the codes on other multiprocessor platforms. |
| File Format | PDF HTM / HTML |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |