Loading...
Please wait, while we are loading the content...
Exploring Shared State in Key-Value Store for Window-Based Multi-Pattern Streaming Analytics
| Content Provider | Hyper Articles en Ligne (HAL) |
|---|---|
| Author | Marcu, Ovidiu-Cristian Tudoran, Radu Nicolae, Bogdan Costan, Alexandru Antoniu, Gabriel Pérez-Hernández, María S. |
| Copyright Year | 2017 |
| Abstract | We are now witnessing an unprecedented growth of data that needs to be processed at always increasing rates in order to extract valuable insights. Big Data streaming analytics tools have been developed to cope with the online dimension of data processing: they enable real-time handling of live data sources by means of stateful aggregations (operators). Current state-of-art frameworks (e.g. Apache Flink [1]) enable each operator to work in isolation by creating data copies, at the expense of increased memory utilization. In this paper, we explore the feasibility of deduplication techniques to address the challenge of reducing memory footprint for window-based stream processing without significant impact on performance. We design a deduplication method specifically for window-based operators that rely on key-value stores to hold a shared state. We experiment with a synthetically generated workload while considering several deduplication scenarios and based on the results, we identify several potential areas of improvement. Our key finding is that more fine-grained interactions between streaming engines and (key-value) stores need to be designed in order to better respond to scenarios that have to overcome memory scarcity. |
| Related Links | https://inria.hal.science/hal-01530744/file/PID4664669.pdf |
| Conference Proceedings | Workshop on the Integration of Extreme Scale Computing and Big Data Management and Analytics in conjunction with IEEE/ACM CCGrid 2017 |
| DOI | 10.1109/ccgrid.2017.126 |
| Language | English |
| Publisher | HAL CCSD |
| Access Restriction | Open |
| Subject Keyword | sliding-window aggregations Big Data memory deduplication streaming analytics Apache Flink Distributed, Parallel, and Cluster Computing [cs.DC] Computer Science [cs] |
| Content Type | Text |
| Resource Type | Conference Proceedings |
| Subject | Medicine |