Loading...
Please wait, while we are loading the content...
Efficient data distribution strategy for join query processing in the cloud
| Content Provider | ACM Digital Library |
|---|---|
| Author | Wang, Haiping Meng, Xiaofeng Chai, Yunpeng |
| Abstract | There are many advantages for large scale data management in the cloud. More and more companies start to migrate their data into cloud data management systems. Join query becomes a challenging research problem in cloud. To finish a join query in the cloud, data among different nodes need to be transferred. The arrangement of data transmission and local data processing is known as a distribution strategy for a query. The transmission cost (network workload between servers and the transmission time delay) will be very high if the strategy is not properly chosen. Existing cloud systems either do not support join query or just use MapReduce to support some simple join queries. The problem of using redundant data for join query optimization in cloud environment is studied in this paper. Two novel algorithms, Set Cover based algorithm (SC) and Minimum Element based algorithm (ME), are proposed to reduce data transmission cost. The experiment results demonstrate that the proposed methods can greatly reduce the data transmission cost compared with the naive method. Besides, the result is very close to the optimal strategy. |
| Starting Page | 15 |
| Ending Page | 22 |
| Page Count | 8 |
| File Format | |
| ISBN | 9781450309561 |
| DOI | 10.1145/2064085.2064089 |
| Language | English |
| Publisher | Association for Computing Machinery (ACM) |
| Publisher Date | 2011-10-28 |
| Publisher Place | New York |
| Access Restriction | Subscribed |
| Subject Keyword | Cloud computing Distribution strategy Replicate Join |
| Content Type | Text |
| Resource Type | Article |