Loading...
Please wait, while we are loading the content...
Similar Documents
Musketeer: all for one, one for all in data processing systems
| Content Provider | CiteSeerX |
|---|---|
| Author | Gog, Ionel Schwarzkopf, Malte Crooks, Natacha Grosvenor, Matthew P. Allen Clement† Steven, H. |
| Abstract | Many systems for the parallel processing of big data are available today. Yet, few users can tell by intuition which system, or combination of systems, is “best ” for a given workflow. Porting workflows between systems is tedious. Hence, users become “locked in”, despite faster or more ef-ficient systems being available. This is a direct consequence of the tight coupling between user-facing front-ends that ex-press workflows (e.g., Hive, SparkSQL, Lindi, GraphLINQ) and the back-end execution engines that run them (e.g., MapReduce, Spark, PowerGraph, Naiad). We argue that the ways that workflows are defined should be decoupled from the manner in which they are executed. To explore this idea, we have built Musketeer, a workflow manager which can dynamically map front-end workflow descriptions to a broad range of back-end execution engines. Our prototype maps workflows expressed in four high-level query languages to seven different popular data pro-cessing systems. Musketeer speeds up realistic workflows by up to 9 × by targeting different execution engines, with-out requiring any manual effort. Its automatically generated back-end code comes within 5%–30 % of the performance of hand-optimized implementations. 1. |
| File Format | |
| Access Restriction | Open |
| Content Type | Text |