Loading...
Please wait, while we are loading the content...
Similar Documents
Big Data Analytics on Cray XC Series DataWarp using Hadoop, Spark and Flink
| Content Provider | Semantic Scholar |
|---|---|
| Author | Schmidtke, Robert Laubender, Guido Steinke, Thomas |
| Copyright Year | 2016 |
| Abstract | We currently explore the Big Data analytics capabilities of the Cray XC architectures to harness the computing power for increasingly common programming paradigms for handling large volumes of data. These include MapReduce and, more recently, in-memory data processing approaches such as Apache Spark and Apache Flink. We use our Cray XC Test and Development System (TDS) with 16 diskless compute nodes and eight DataWarp nodes. We use Hadoop, Spark and Flink implementations of selected benchmarks from the Intel HiBench micro benchmark suite and others to find suitable runtime configurations of these frameworks for the TDS hardware. Motivated by preliminary results in throughput per node in the popular Hadoop TeraSort benchmark we conduct a detailed scaling study and investigate resource utilization. Furthermore seek to evaluate scenarios where using DataWarp nodes might be advantageous to using Lustre as file system backends. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | https://cug.org/proceedings/cug2016_proceedings/includes/files/pap141s2-file1.pdf |
| Alternate Webpage(s) | https://cug.org/proceedings/cug2016_proceedings/includes/files/pap141s2-file2.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |