Apache Spark Training
Training in Chennai
Module 1
- Bigdata Landscape
- Why Bigdata-3 v s-Hadoop Ecosystem
- Introduction to Apache Spark
- Features of Apache Spark
- Apache Spark Stack
- Introduction to RDD’s
- RDD’s Transformation
- What is good and bad In MapReduce?
- Why to use Apache Spark
Module 2
- Installation
- Single node
- Include Hadoop
- Include Apache Spark
- Include Hive
- Include Sqoop
- Include Hue
Module 3
- Deep Dive in HDFS
- HDFS Design
- Fundamental of HDFS
- Rack Awareness
- Read/Write from HDFS
- HDFS Federation and High Availability (Hadoop 2xx)
- HDFS Command Line Interface
Module 4
- Spark Shell Hands On Using HDFS
- Spark Shell Introduction
- Create file using Hue-Spark Shell extracting file from HDFS
- Create RDD from HDFS file
Module 5
- Programming with RDD Part-1
- Creating new RDD
- Transformations on RDD
- Lineage Graph
- Actions on RDD
- RDD Concepts on Persist and Cache
- Lazy evaluation of RDD
Module 6
- Scala/Spark Functional Programming
- Using Function Literals
- Anonymous Functions
- Define a function which accepts another function
Module 7
- RDD Transformation Programming in Depth
- Hands on and core concepts of map() transformation
- Hands on and core concepts of filter() transformation
- Hands on and core concepts of flatMap() transformation
- Compare map and flatMap transformation
Module 8
- Apache Spark in Action
- Hands on and core concepts of reduce() action
- Hands on and core concepts of fold() action
- Hands on and core concepts of aggregate() action
- Basics of Accumulator-Hands on and core concepts of collect() action
- Hands on and core concepts of take() action
- Ordered access of RDD
Module 9
- Apache Spark Execution Model
- How Spark execute program
- Concepts of RDD partitioning
- RDD data shuffling and performance issue
Module 10
- Apache Spark PairRDD
- Core concepts of PairRDD
- Creation of PairRDD
- Aggregation in PairRDD
- Aggregation functions understanding in depth
- How reduceByKey() work conceptually?
- How foldByKey() work conceptually?
- How combineByKey()work conceptually?
Module 11
- Spark PairRDD HandsOn Lab
- reduceByKey
- foldByKey
- combineByKey
- groupByKey
Module 12
- Spark PairRDD Joining, Zipping and
- reduceByKey versus groupByKey performance issue
- cogroup
- zip
- joining (left, right, inner etc)
Module 13
- Understanding Hadoop SequenceFile
- Creating Seqnce File and Processing using SPark
- Creating SequenceFile using TSV file
- Loading Data in Apache Hive
- Processing SequnceFile as an RDD
Module 14
- Spark Shared Variables
- Shared Variables: Broadcast Variables-Shared Variables: Accumulators
Module 15
- Spark Accumulator
- Word count and Character Count
- Counting Bad records in a file
Module 16
- Spark BroadCast Variable
- Joining two csv files one as a Broadcasted Lookup table
Module 17
- Spark API
- BroadCast Variable, Filter Functions and Saving File
Module 18
- Spark API
- Spark Join, GroupBy and Swap function
Module 19
- Spark API
- Remove Header from CSV file and Map Each column to Row Data
Module 20
- Spark SQL
- HiveContext
- Schema RDD replaced by DataFrame API
- History of SparkSQL
- Catalyst Optimizer
Module 21
- SparkSQL HandsOn Sessions
- Hive Configuration
- Create Hive table using Spark
- Load Data in HIve table using Spark
- Create another table using DataFrame
Module 22
- Implementing Business Logic using SparkSQL
- Loading CSV file
- Spark Case classes (To create schema for csv file)
- Convert RDD to DataFrame using DataFrmae API for query data
- Using SQL query on DataFrame
Module 23
- Spark Loading and Saving Your Data
- TextFiles
- CSV and TSV files
- JSON Files
Module 24
- Spark Loading and Saving Your Data SQL and NOSQL
- JDBC (MySQL)
- HBase (NoSQL)
Module 25
- Writing Spark Applications
- Spark Applications vs Spark Shell
- Creating the SparkContext
- Configuring Spark Properties
- Building and Running a Spark Application
- Logging
Module 26
- Spark Streaming in Depth Part-1
- Spark Streaming Overview-Example: Streaming Word Count
Module 27
- Spark Streaming in Depth Part-2
- Other Streaming Operations
- Sliding Window Operation
- Developing Spark Streaming Applications
Module 28
- Spark Algorithms Part-1
- Iterative Algorithm
- Graph Analysis
- Machine Learning
Module 29
- Case studies