Loading...
Please wait, while we are loading the content...
Similar Documents
Brawny cores still beat wimpy cores, most of the time
| Content Provider | Semantic Scholar |
|---|---|
| Author | Hölzle, Urs |
| Copyright Year | 2010 |
| Abstract | Slower but energy efficient " wimpy " cores only win for general workloads if their single-core speed is reasonably close to that of mid-range " brawny " cores. At Google, we've been long-term proponents of multicore architectures and throughput-oriented computing. In warehouse-scale systems 1 throughput is more important than single-threaded peak performance, because no single processor can handle the full workload. In addition, maximizing single-threaded performance costs power through larger die areas (for example, for larger reorder buffers or branch predictors) and higher clock frequencies. Multicore architectures are great for warehouse-scale systems because they provide ample parallelism in the request stream as well as data parallelism for search or analysis over petabyte data sets. We classify multicore systems as brawny-core systems, whose single-core performance is fairly high, or wimpy-core systems, whose single-core performance is low. The latter are more power efficient. Typically, CPU power decreases by approximately O(k 2) when CPU frequency decreases by k, and decreasing DRAM access speeds with core speeds can save additional power. So why doesn't everyone want wimpy-core systems? Because in many corners of the real world, they're prohibited by law—Amdahl's law. Even though many Internet services benefit from seemingly unbounded request-and data-level parallelism, such systems aren't above the law. As the number of parallel threads increases, reducing serialization and communication overheads can become increasingly difficult. In a limit case, the amount of inherently serial work performed on behalf of a user request by slow single-threaded cores will dominate overall execution time. Cost numbers used by wimpy-core evangelists always exclude software development costs. Unfortunately, wimpy-core systems can require applications to be explicitly parallelized or otherwise optimized for acceptable performance. For example, suppose a Web service runs with a latency of one second per user request, half of it caused by serial CPU time. If we switch to wimpy-core servers, whose single-threaded performance is three times slower, the response time doubles to two seconds and developers might have to spend a substantial amount of effort to optimize the code to get back to the one-second latency. Software development costs often dominate a company's overall technical expenses, so forcing programmers to parallelize more code can cost more than we'd save on the hardware side. Most application programmers prefer to think of an individual request as a single-threaded program, leaving the more difficult parallelization problem to middleware that exploits request-level parallelism (that is, it runs independent user requests … |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36448.pdf |
| Alternate Webpage(s) | http://research.google.com/pubs/archive/36448.pdf |
| Alternate Webpage(s) | http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36448.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |