Apache Spark Course Content
You will be exposed to the complete Apache Spark Training in Bangalorecourse details in the below sections.
Learn how to apply data science techniques using parallel programming during Spark training, to explore big (and small) data.
Introduction to Big Data
Challenges with Big Data
Batch Vs. Real Time Big Data Analytics
Batch Analytics – Hadoop Ecosystem Overview
Real Time Analytics Options
Streaming Data – Storm
In Memory Data – Spark
What is Spark?
Modes of Spark
Spark Installation Demo
Overview of Spark on a cluster
Spark Standalone Cluster
Learn how to invoke spark shell, build spark project with sbt, distributed persistence and much more…in this module.
Invoking Spark Shell
Creating the Spark Context
Loading a File in Shell
Performing Some Basic Operations on Files in Spark Shell
Building a Spark Project with sbt
Running Spark Project with sbt
Caching Overview
Distributed Persistence
Spark Streaming Overview
Example: Streaming Word Count
The main abstraction Spark provides is a resilient distributed dataset (RDD), which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel.
RDDs
Spark Transformations in RDD
Actions in RDD
Loading Data in RDD
Saving Data through RDD
Spark Key-Value Pair RDD
Map Reduce and Pair RDD Operations in Spark
Scala and Hadoop Integration Hands on