Loading...
Please wait, while we are loading the content...
Similar Documents
Supercomputing on massively parallel bit-serial architectures
| Content Provider | Semantic Scholar |
|---|---|
| Author | Iobst, Ken |
| Copyright Year | 1985 |
| Abstract | Consider the idea that supercomputing is a synergy of generic algorithms, languages and architectures and that real breakthroughs in parallel computing will be achieved by considering all three together in a simulated software environment. Engineering tradeoffs could be made between performance, machine transparency, standardization and program portability before any new machines are actually built. Standardized languages could be developed for generic subclasses of parallel machines; languages that really give high peformance and encourage free parallel expression and "thinking in parallel". My own research on the Goodyear MPP (Massively Parallel Processor), suggests that hlgh-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/ associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined "train" broadcast, and conditional branching at the PE control unit level. The preliminary design of a FORTRAN-like parallel language for the MTP has been completed and is being used to write programs to perform sparse matrix array selection, mln/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. The MPP timing estimate for Gausslan elimination of a 4K by 4K single bit matrix is under one second-the equivalent of approximately 64 billion scalar operations. Parallel Gauss-Jordan matrix inversion is also being investigated. The estimated time to invert a 128 X 128, 32 bit real matrix using full pivoting on the MPP is 50 msec. This is roughly equivalent to a I00 MFLOP scalar rate. The MPP is a SIMD machine of 16384 single bit processors arranged in a 128 X 128 array. Individual PE's are interconnected with their four nearest neighbors. Each PE can address 1024 bits of its own local memory. A 32 bit shift register in each PE allows for micro-pipelining of long words and faster partial sum accumulation for multiplication. The machine can execute 160 billion mlcro-instructions per second which translates to 800 GOPS for some instructions. Operations include single bit logical, shift, and add as well as column I/0 and one or two dimensional routing in a spiral, cyclinder, or torus. All operations can be directly or indirectly masked. The logical "or" of one bit per PE (SUMOR) can be used to pass array information back to the PE control unit for broadcast to other PE's, scalar I/0 or conditional branching. … |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19870019702.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |