Loading...
Please wait, while we are loading the content...
Big Data Analysis with Apache Spark
| Content Provider | Semantic Scholar |
|---|---|
| Author | Singh, Pallavi Anand, Saurabh Sagar, B. M. |
| Copyright Year | 2017 |
| Abstract | Manipulating big data distributed over a cluster is one of the big challenges which most of the current big data oriented companies face. This is evident by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework which caters to provide solution for big data management. This paper, present a discussion on how technically Apache Spark help us in Big Data Analysis and Management. The paper aims to provide the conclusion stating apache Spark is more beneficial by almost 50 percent while working on big data. As when data size was increased to 5*10 the time taken was drastically reduced by around 50 percent compared to when queried Cassandra without Spark. Cassandra is used as Data Source for conducting our experiment. For this, a experiment is conducted comparing spark with normal Cassandra DataSet or ResultSet. Gradually increased the number of records in Cassandra table and time taken to fetch the records from Cassandra using Spark and traditional Java ResultSet was compared. For the initial stages, when data size was less than 10 percent, Spark showed almost an average response time which was almost equal to the time taken without the use of Spark. As the data size exceeded beyond 10 percent of records Spark response time dropped by almost 50 percent as compared to querying Cassandra without Spark .Final record was analyzed at 5*10 records. As the data size was increased, Spark was proved better than the traditional Cassandra ResultSet approach by almost reducing the time taken by 50 percent for really big dataset as our case of 5*10 records. |
| Starting Page | 6 |
| Ending Page | 8 |
| Page Count | 3 |
| File Format | PDF HTM / HTML |
| DOI | 10.5120/ijca2017915251 |
| Volume Number | 175 |
| Alternate Webpage(s) | https://d37djvu3ytnwxt.cloudfront.net/assets/courseware/v1/894e32ce1e020f4ce0a3e334eede735b/asset-v1:BerkeleyX+CS110x+2T2016+type@asset+block@Lecture2s.pdf |
| Alternate Webpage(s) | https://www.ijcaonline.org/archives/volume175/number5/28482-28482-2017915251?format=pdf |
| Alternate Webpage(s) | https://d37djvu3ytnwxt.cloudfront.net/assets/courseware/v1/54c1557779ffc2592551c0d23cecde0e/asset-v1:BerkeleyX+CS110x+2T2016+type@asset+block/Lecture1s.pdf |
| Alternate Webpage(s) | https://courses.edx.org/asset-v1:BerkeleyX+CS110x+2T2016+type@asset+block/Lecture3s.pdf |
| Alternate Webpage(s) | https://doi.org/10.5120/ijca2017915251 |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |