Loading...
Please wait, while we are loading the content...
Similar Documents
Scaling LAPACK Panel Operations Using Parallel Cache Assignment (2010)
| Content Provider | CiteSeerX |
|---|---|
| Author | Castaldo, Anthony M. Whaley, R. Clint Samuel, Siju |
| Description | In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high performance Level 3 BLAS. The Level 3 BLAS have excellent scaling, but panel processing tends to be bus bound, and thus scales with bus speed rather than the number of processors (p). Amdahl’s law therefore ensures that as p grows, the panel computation will become the dominant cost of these LAPACK routines. Our contribution is a novel parallel cache assignment approach to the panel factorization which we show scales well with p. We apply this general approach to the QR, QL, RQ, LQ and LU panel factorizations. We show results for two commodity platforms: an 8-core Intel platform and a 32-core AMD platform. For both platforms and all twenty implementations (five factorizations each of which is available in 4 types), we present results that demonstrate that our approach yields significant speedup over the existing state of the art. |
| File Format | |
| Language | English |
| Publisher Date | 2010-01-01 |
| Publisher Institution | In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM |
| Access Restriction | Open |
| Subject Keyword | Lapack Many Matrix Operation Lapack Routine Twenty Implementation Panel Factorization Amdahl Law Therefore Lu Panel Factorization Novel Parallel Cache Assignment Approach 32-core Amd Platform General Approach Block Algorithm Approach Yield Significant Speedup Remainder Matrix Panel Processing Excellent Scaling Commodity Platform Unblocked Algorithm 8-core Intel Platform Panel Computation High Performance Level Dominant Cost |
| Content Type | Text |
| Resource Type | Article |