Loading...
Please wait, while we are loading the content...
Similar Documents
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load/Store Optimization
| Content Provider | Semantic Scholar |
|---|---|
| Author | Roth, Amir |
| Copyright Year | 2004 |
| Abstract | A high-bandwidth, low-latency load-store unit is a critical component of a dynamically scheduled processor. Unfortunately, it is also one of the most complex and non-scalable components. Recently, several researchers have proposed techniques that simplify the core load-store unit and improve its scalability in exchange for the in-order pre-retirement re-execution of some subset of the loads in the program. We call such techniques load/store optimizations. One recent optimization attacks load queue (LQ) scalability by replacing the expensive associative search that is used to enforce intraand interthread ordering with load re-execution. A second attacks store queue (SQ) scalability by speculatively filtering some load accesses and some store entries from it. The speculatively accessed, speculatively populated SQ can be made smaller and faster, but load re-execution is required to verify the speculation. A third uses a hardware table to identify redundant loads and skip their execution altogether. Redundant load elimination is highly accurate but not 100%, so re-execution is needed to flag false eliminations. Unfortunately, the inherent benefits of load/store optimizations are mitigated by re-execution itself. Reexecution contends for cache bandwidths with store retirement, and serializes load re-execution with subsequent store retirement. If a particular technique requires a sufficient number of load re-executions, the cost of these re-executions will outweigh the benefits of the technique entirely and may even produce drastic slowdowns. This is the case for the SQ technique. Store Vulnerability Window (SVW) is a new mechanism that reduces the re-execution requirements of a given load/store optimization significantly, by an average of 85% across the three load/store optimizations we study. This reduction relieves cache port contention and removes many of the dynamic serialization events that contribute the bulk of re-execution's cost, and allows these techniques to perform up to their full potential. For the scalable SQ optimization, this means the chnace to perform at all. Without SVW, this technique posts significant slowdowns. SVW is a simple scheme based on monotonic store sequence numbering and a novel application of Bloom Filtering. The cost of an effective SVW implementation is a 1KB buffer and an 2B field per LQ entry. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-04-29. This technical report is available at ScholarlyCommons: https://repository.upenn.edu/cis_reports/35 |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://repository.upenn.edu/cgi/viewcontent.cgi?article=1023&context=cis_reports |
| Alternate Webpage(s) | https://repository.upenn.edu/cgi/viewcontent.cgi?article=1023&context=cis_reports&httpsredir=1&referer= |
| Alternate Webpage(s) | http://www.cis.upenn.edu/departmental/reports/svw-tr04.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |