What is Apache Spark ?
Apache Spark is an open-source data analytics cluster computing framework. Spark was originally developed in the AMPLab at UC Berkeley and fits in the Hadoop open-source community. Spark is built on top of the Hadoop Distributed File System (HDFS).
Why Spark ?
Spark comes with ease of use, generality, speed and it runs every where.
Why should I learn Spark ?
Apache Spark is expected to be the next big thing in Big Data. Due to high performance and dependable applications, spark is getting more and more acceptance
Our course curriculum
We cover following in our 2 day training : Spark introduction, installation, Spark operations, components, configuration, applications, job executions, deployment, data workflow, Spark use cases.
Detailed Course Curriculum
Day 1: *
Introduction, installation, spark operations, Why Spark * Problem with Traditional large-scale systems * Introducing Spark * Spark Basics * Installing Spark * Using Spark Shell * RDDs / HDFS data locality * Transformations and Actions * Functional Programming with Spark * Spark and the Hadoop Ecosystem * Spark and MapReduce * RDDs * RDDs Operations * key-value pair RDDs * Running Spark on cluster
Day 2: * Overview * Standalone cluster * Parallel Programming with Spark * RDD Partitions * Working with Partitions * Executing Parallel Operations * Caching and Persistence * Caching Overview * Distributed Persistence * Developing Spark Applications * Spark Context * Configuring Spark Properties * Building and Running a Spark Application * Spark-Streaming * Spark-Streaming Overview * Streaming Operations * Developing Spark Streaming Applications * Improving Spark Performance * Shared Variables: Broadcast Variables * Shared Variables: Accumulators * Common Performance Issues