Loading...
Please wait, while we are loading the content...
Similar Documents
Big Data Data Management Systems performance analysis using Aloja and BigBench
| Content Provider | Semantic Scholar |
|---|---|
| Author | Rivero, Alejandro Montero Pérez, David Carrera Poggi, Nicolás |
| Copyright Year | 2018 |
| Abstract | Traditional RDMBs cannot accommodate the need to analyze large volumes of data that may contain non-structured information, while also performing interactive applications that run through the data more than once. SQL-like Big Data infrastructure, offers the benefits of Big Data architectures with the ease of the SQL language. In this thesis project, the ALOJA benchmarking platform is expanded with the first standardized Big Data benchmark: BigBench. By making use of ALOJA and BigBench, it is possible to test SUTs and engines in discrete scenarios or discover possible bottlenecks. A proposed BigBench expansion, allows to test engine elasticity and how they react to workloads with diverse complexity. We demonstrate the capabilities of ALOJA and BigBench by analyzing the de facto SQL Big Data engine: Hive, against the on-growing Spark-SQL. Spark shows ahead in CPU intensive applications, while lacking performance in disk access when using rotational disks, even throttling network and CPU at high levels of concurrency. Hive requires more memory per CPU core than Spark, becoming unreliable when workload complexity grow. The expanded BigBench architecture allowed to detect a difference in task management between engines. Hive parallelizes independent tasks, assigning resources in function of their complexity. Spark, on the other hand, eliminates concurrency by executing tasks in a sequential order and assigning them the complete cluster resources. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | https://upcommons.upc.edu/bitstream/handle/2117/117985/131431.pdf?isAllowed=y&sequence=1 |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |