Saturday, December 26, 2015

IT (8): Spark..................

Spark
Hadoop handles big data and their analysis. However, the data processing is slow due to MapReduce attributes like replication, serialization and disk IO
To speed up the Hadoop-based analysis, Spark was developed. So, Hadoop is just one of the implementation of Spark.
Apache Spark is a very fast cluster computing technology.
Its functions encompass batch applications, iterative algorithms, streaming, interactive queries etc. 
It makes an application fast by storing data in memory and minimizing the number of read/write operations to the disk.
Apart from the ‘Map’ and ‘reduce’, other advantages of Spark include .compatibility to SQL queries, machine learning, streaming data, and graph algorithms etc.
Spark data structure in called RDD (Resilient Distributed Datasets)
Spark can be installed in scala language.
###################################################

No comments:

Post a Comment