Loading...
Please wait, while we are loading the content...
Similar Documents
Compiler Optimizations for Cache Locality and Coherence
| Content Provider | Semantic Scholar |
|---|---|
| Copyright Year | 1994 |
| Abstract | Almost every modern processor is designed with a memory hierarchy organized into several levels, each of which is smaller, faster, and more expensive than the level below. High performance requires the eeective use of the cached data, i.e. cache locality. Smart compiler transformations can relieve the programmer from hand-optimizing for the speciic machine architectures. In a multiprocessor system, data inconsistency may occur between memory and caches. For example, the memory and multiple caches may have inconsistent copies of the same cache block. This introduces the problem of cache coherence. Several cache coherence protocols have been developed to maintain data coherence for multiple processors. Since multiple variables are located in the same block, it may cause the problem of false sharing, which has been identiied by many researchers as a major obstacle to high performance. Therefore, in a multiprocessor system, we need to avoid false sharing as well as exploit cache locality. In this paper, we rst develop a new data reuse model and an algorithm called height reduction to improve cache locality. The advantage of this algorithm is that it can improve band matrix programs as well as dense matrix programs. It is more accurate and general than the existing techniques on improving cache locality, which were developed to optimize dense matrix programs. Then with the height reduction algorithm, we extend loop tiling to exploit not only intra-tile data locality but also inter-tile data locality. We call the new tiling aanity tiling. Our experiments show that aanity tiling is less sensitive to the choice of the tile size. Finally, we show that the algorithm also helps to eliminate or reduce false sharing in multiprocessor systems. With the height reduction algorithm and aanity tiling, signiicant performance improvement (speedups from 2.5 to 10) has been observed on HP workstations and KSR1 multiprocessors. |
| File Format | PDF HTM / HTML |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |