Big Data Hadoop Developer Training

Module 1

Introduction to Big Data
Characteristics
Why, How and What s of Big data
Existing OLTP, ETL,DWH,OLAP

Module 2

Introduction to Hadoop Ecosystem
Architecture-HDFS
Sharding , Distributed and Replication factor (SDR)
Daemons
Map reduce (MRV1) and Yarn
Hadoop v1 and v2
Hadoop Data fedaration

Module 3

Prerequisite for Installation
Single node , Pseudo distributed and Multinode cluster
Virtual machine using Linux ubuntu/CentOS
Installation of hadoop in cloud (Azure/AWS)
Installation of Java ,ssh,eclipse
Installation and configuration of Hadoop,HDFS,Daemons,YARN Daemons
High Availability (Active and Standby)
Automatic and manual failover
Hadoop Fs shell commands
Writing Data to HDFS
Reading Data from DFS

Module 4

Rack awareness policy and Replica placement Strategy
Failure Handling
Namenode
Datanode
Block-Safe mode
Rebalancing and load optimization
Trouble shooting and error rectification
Hadoop fs shell commands-Unix and Java-Basics
Assessment 1

Module 5

Introduction to Mapreduce
Architecture of Map reduce
Execution Map reduce in YARN
App Master ,Resource Manager and Node manager
Input format , Input split and Key Value Pairs
class and methods of Mapreduce paradigm
Mapper
Reducer
Partitioner
Custom and Default partition
Shuffle and Sort
Combiner-Scheduler
App Master /manager
Container-Node manager

Module 6

Map reduce Hands on
word count program/ log analytics
Hadoop streaming in R/Python
Data processing Transformations
Map only jobs and uber jobs
Inverted index and searches

Module 7

MR Programs 2
Structured and Unstructured Data handling
optimizing using Combiner
Partitioner
Single and multiple column
Inverted Index
XML -semi structure
Map side joins
Reduce side join

Module 8

Introduction to Hive Data warehouse
Installation hive and metastore database
Configure metastore to mysql
Hive QL Commands

Module 9

Manipulation and anlytical function in hive

Managed table and external tables

Partitioning and Bucketing

Complex data types and Unstructured data

Advance HQL commands

UDF and UDAF

Integration with Hbase

SerDe / Regular Expression

File formats

JSON to AVRO file conversion

Parquet compressed file to uncompressed

AVRO schema and data file

ORC file

Assessment 2

Module 10

Introduction to PIG
Installation-Bags and collections
Commands and Scripts
Pig UDF

Module 11

Introduction to NOSQL
ACID /CAP/BASE
Key value pair
Map reduce
Column family
Hbase Documennt
MongoDB
Graph DB
Neo4j

Module 12

Introduction to HBASE and installation
The HBase Data Model
The HBase Shell
HBase Architecture
Schema Design
The HBase API
HBase Configuration and Tuning

Module 13

Ingest data from RDB
Introduction to Sqoop and installation
Import and export data from and to RDB
Bulk loading , Incremental load , Split by , Conditional query
Sqoop validation and jobs

Module 14

Ingest streaming data

Flume Architecture

Agent ,Source,sink channel

Ingest log file

Collecting data from twitter for Sentimental analysis

Assessment 3

Module 15

Integrate With ETL
Talend Big data edition – Components of big data

Module 16

Big data Analytics
Dimensional modelling
Data Visualization
Tableau – Hive and spark sql connectors

Module 17

Spark core and Components
Spark Shell
Create RDD from HDFS /Local
Creating new RDD-Transformations on RDD
Lineage Graph – DAG
Actions on RDD
RDD Concepts on Persist and Cache-Lazy evaluation of RDD
Hands on and core concepts of map() transformation
Hands on and core concepts of filter() transformation
Hands on and core concepts of flatMap() transformation Compare map and flatMap transformation Hands on and core concepts of reduce() action
Hands on and core concepts of fold() action-Hands on and core concepts of aggregate() action
Basics of Accumulator
Hands on and core concepts of collect() action
Hands on and core concepts of take() action
Apache Spark Execution Model
How Spark execute program
Concepts of RDD partitioning
RDD data shuffling and performance issue

Module 18

Data frames and dataset
Spark SQL
Pyspark

Module 19

Spark jobs
Build scala program using SBT /Maven
Spark submit and spark Application

Module 20

KAFKA-Publisher /Subscriber
Consumer and producer

Module 21

HUE
Monitoring and scheduling

Module 22

Zeppelin
OOZIE-Workflow and Co-ordinator

Module 23

Distribution Installation on cloud or Sandbox
Cloudera -cloudera manager
Horton works -ambari server
MapR – MCS

Module 24

Introduction to Data science-Machine learning-Statistical Analysis-Sentiment Analysis

Module 25

Use Multinode cluster setup-High Availabilty-Hadoop data federation-Commissioning and-decommissioning-Automatic and manual failover-Zookeeper failover controller

Module 26

Use cases, Case studies and Proof of Concept-Working on different Distributions

Module 27

CCA Spark and Hadoop Developer Exam (CCA175)
CCP Data Engineer (DE575)
HDPCD CERTIFICATION
HDP CERTIFIED APACHE SPARK DEVELOPER

Big Data Hadoop Developer Training (Hadoop ,Spark , NoSQL , Cloud)