Big Data Spark NoSql Cloud Training

Module 1

Introduction to Big Data
Characteristics
Why, How and Whats of Big data
Existing OLTP, ETL,DWH,OLAP

Module 2

Introduction to Hadoop Ecosystem Architecture-HDFS
Sharding , Distributed and Replication factor (SDR)
Daemons
Map reduce (MRV1) and Yarn
Hadoop v1 and v2
Hadoop Data federation

Module 3

Prerequisite for Installation
Single node , Pseudo distributed and Multinode cluster
Virtual machine using Linux ubuntu/CentOS
Installation and configuration of Hadoop, HDFS, Daemons, YARN Daemons
High Availability (Active and Standby)
Automatic and manual failover
Hadoop Fs shell commands
Writing Data to HDFS
Reading Data from DFS

Module 4

Rack awareness policy and Replica Placement Strategy
Failure Handling
Namenode
Datanode
Block-Safe mode
Rebalancing and load optimization
Trouble shooting and error rectification
Hadoop fs shell command

Module 5

Introduction to Map reduce
Architecture of Map reduce
Execution Map reduce in YARN
App Master, Resource Manager and Node manager
Input format, Input split and Key Value Pairs
Class and methods of Map reduce paradigm
Mapper
Reducer
Partitioner
Custom and Default partition
Shuffle and Sort
Combiner-Scheduler
App Master /manager
Container-Node manager

Module 6

Map reduce Hands on word count program/ log analytics
Hadoop streaming in R/Python
Data processing Transformations
Map only jobs and Uber jobs
Inverted index and searches

Module 7

Structured and Unstructured Data handling optimizing using Combiner/Partitioner
Custom partition and default partition

Module 8

Introduction to Hive Data Warehouse
Installation hive and metastore database
Configure metastore to MySQL
Creation of hive table
Different ways of loading data to hive
Hive QL Commands
Data transformations: joins, filter and others

Module 9

Manipulation and analytical function in hive
Managed table and external tables
Partitioning and Bucketing
Complex data types and unstructured data
Advance HQL commands
UDF and UDAF
Integration with Hbase

Module 10

SerDe / Regular Expression
File formats
JSON, AVRO file conversion
Parquet compressed file to uncompressed
AVRO schema and data file
ORC file

Module 11

Ingest data from RDB
Introduction to Sqoop and installation
Import and export data from and to RDB
Bulk loading , Incremental load , Split by , Conditional query
Sqoop validation and sqoop jobs
Data ingestion into hive
Data ingestion to Hbase
Different file formats

Module 12

Ingest streaming data
Flume Architecture
Agent, Source, sink channel
Ingest log file
Collecting data from twitter for Sentimental analysis

Module 13

Spark core and Components
Spark Shell
Create RDD from HDFS /Local
Creating new RDD-Transformations on RDD
Lineage Graph – DAG
Actions on RDD
Different resource management
Spark-shell Scala REPL
PySpark
Monitoring jobs

Module 14

Scala/Spark Functional Programming
Using Function Literals
Anonymous Functions
Define a function which accepts another function
Spark Loading and Saving Your Data
Text Files
CSV and TSV files
JSON Files
Spark jobs
Build Scala program using SBT /Maven
Spark submit and spark Application

Module 15

RDD Transformation Programming in Depth
Hands on and core concepts of map() transformation
Hands on and core concepts of filter() transformation
Hands on and core concepts of flatMap() transformation
Compare Map and Flat Map transformation
Apache Spark in Action
Hands on and core concepts of reduce() action
Hands on and core concepts of fold() action
Hands on and core concepts of aggregate() action
Basics of Accumulator-Hands on and core concepts of collect() action
Hands on and core concepts of take() action
Ordered access of RDD

Module 16

Creating Dataframe
Data Frames & Datasets
Creating Dataframe
Interoperating with RDDs
JSON and Parquet File Formats
Loading Data through Different Source
RDD to DF and DF.RDD
Dataframe operations(Dataset)

Module 17

Need for Spark SQL
What is Spark SQL?
Spark SQL Architecture
SQL Context in Spark SQL

Module 18

Spark Streaming Overview
Streaming data collections from different sources
Other Streaming Operations
Sliding Window Operation
Developing Spark Streaming Applications
Kafka integration

Module 19

Introduction to NOSQLACID vs CAP theorem/BASE
Schema design
Introduction to HBASE and installation
The HBase Data Model
The HBase Shell
HBase Architecture
Schema Design

Module 20

The HBase APIH
Base Configuration and Tuning
Hive and HBase integration
Loading data using sqoop
Time to live
Compactions
Tombstone

Module 21

Hue web interface
HIVE,PIG editors
Oozie scheduler
Coordinator
Dashboard
Configuration files and monitoring

Module 22

Kafka
Producer ,consumer and topics
Flume with Kafka
Kafka topic with spark streaming

Module 23

Hadoop distribution
Cloudera components
Horton works components
Security
Monitoring
Dashboard

Module 24

Zeppelin notebook
Ambari
Cloudera manager

Module 25

AWS and Azure in BigdataS3 or Azure Blob storage components and usage
Module 26
Talend BigData edition
ETL Tool integration
Data analytics using tableau
Connecting with Hadoop Hive server
Interactive visualization

Module 26

Cloudera spark Hadoop Developer certification
Horton works certification
Guidance and mock

Module 27

Introduction to machine learning
Applying machine learning algorithm in Hadoop and spark MLlib
Classification and clustering

Module 28

Case study 1: Sqoop, Hbase, Hive, spark , tableau

Module 29

Case study 2: Kafka, spark streaming and HBase