Loading...
Please wait, while we are loading the content...
Similar Documents
Optimizing Away Joins on Data Streams
| Content Provider | CiteSeerX |
|---|---|
| Author | Toman, David Koudas, Nick Golab, Lukasz Johnson, Theodore Srivastava, Divesh |
| Abstract | Monitoring aggregates on network traffic streams is a compelling application of data stream management systems. Often, streaming aggregation queries involve joining multiple inputs (e.g., client requests and server responses) using temporal join conditions (e.g., within 5 seconds), followed by computation of aggregates (e.g., COUNT) over temporal windows (e.g., every 5 minutes). These types of queries help identify malfunctioning servers (missing responses), malicious clients (bursts of requests during a denial-ofservice attack), or improperly configured protocols (short timeout intervals causing many retransmissions). However, while such query expression is natural, its evaluation over massive data streams is inefficient. In this paper, we develop rewriting techniques for streaming aggregation queries that join multiple inputs. Our techniques identify conditions under which expensive joins can be optimized away, while providing error bounds for the results of the rewritten queries. The basis of the optimization is a powerful but decidable theory in which constraints over data streams can be formulated. We show the efficiency and accuracy of our solutions via experimental evaluation on real-life IP network data using the AT&T Gigascope stream processing engine. Categories and Subject Descriptors H.2.4 [Database Management]: Systems—Query processing; |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Many Retransmissions Rewritten Query Malicious Client System Query Processing Real-life Ip Network Data Temporal Window Error Bound Denial-ofservice Attack Experimental Evaluation Short Timeout Interval Data Stream Management System Query Expression Expensive Join Aggregation Query Network Traffic Stream Compelling Application Database Management Decidable Theory Multiple Input Gigascope Stream Processing Engine Temporal Join Condition Client Request Massive Data Stream Data Stream |
| Content Type | Text |