Big Data Hadoop Developer Training

Training in Chennai

Module 1

Introduction to Big Data


Why, How and What s of Big data


Module 2

Introduction to Hadoop Ecosystem


Map reduce (MRV1)

Hadoop v1 and v2 Hadoop Data fedaration

Module 3

Pre Requisite for Installation

VM Linux ubuntu/CentOS JDK,ssh,eclipse

Installation and configuration of Hadoop,HDFS,Daemons,YARN Daemons

High Availability

Automatic and manual failover

Writing Data to HDFS

Reading Data from DFS

Module 4

Replica placement Strategy

Failure Handling



Block-Safe mode

Rebalancing and load optimization

Trouble shooting and error rectification

Hadoop fs shell commands-Unix and Java-Basics

Module 5

Introduction to Mapreduce

Architecture of Map reduce

Execution Map reduce in YARN

App Master ,Resource Manager and Node manager-Inputformat and Key Value Pairs




Custom and Default

Shuffle and Sort


App Master /manager

Container-Node manager

Module 6

Map reduce Hands on

word count program/ log analytics

Hadoop streaming in R and Python

Data processing Transformations

Map only jobs and uber jobs

Inverted index and searches

Module 7

MR Programs 2

Structured and Unstructured Data handling



Single and multiple column

Inverted Index

XML -semi structure

Map side joins

Reduce side join

Module 8

Introduction to HIVE Datawarehouse


Configure metastore to mysql- Hive QL Commands

Module 9

Manipulation and anlytical function in hive

Managed table and external tables

Partitioning and Bucketing

Complex data types and Unstructured data

Advance HQL commands


Integration with Hbase

SerDe / Regular Expression

Module 10

Introduction to PIG

Installation-Bags and collections

Commands and Scripts


Module 11

Introduction to NOSQL


Key value pair

Map reduce

Column family

Hbase Documennt


Graph DB


Module 12

Introduction to HBASE and installation

The HBase Data Model

The HBase Shell

HBase Architecture

Schema Design

The HBase API

HBase Configuration and Tuning

Module 13

Introduction to Sqoop and installation

Bulk loading

Hadoop Streaming

Module 14

Flume Architecture

Agent ,Source,sink channel

Ingest log file

Collecting data from twitter for Sentimental analysis

Module 15

Integrate With ETL-Talend open Data studio


Module 16

Big data Analytics

Visualization Dimensional modelling Tableau

Module 17


Spark Shell Hands On Using HDFS

Create RDD from HDFS file

Creating new RDD-Transformations on RDD

Lineage Graph

Actions on RDD

RDD Concepts on Persist and Cache-Lazy evaluation of RDD

Hands on and core concepts of map() transformation

Hands on and core concepts of filter() transformation

Hands on and core concepts of flatMap() transformation Compare map and flatMap transformation

Hands on and core concepts of reduce() action

Hands on and core concepts of fold() action-Hands on and core concepts of aggregate() action

Basics of Accumulator

Hands on and core concepts of collect() action

Hands on and core concepts of take() action

Apache Spark Execution Model

How Spark execute program

Concepts of RDD partitioning

RDD data shuffling and performance issue

Module 18

Spark SQL

Module 19

Spark submit and spark Application


KAFKA-Publisher /Subcrriber

Consumer and producer

Module 22

Cloudera manager and VM-HUE

Module 23

OOZIE-Workflow and Co-ordinator

Module 24

Introduction to Data science

Introduction to Machine learning

Introduction to Statistical Analysis

Introduction to Sentiment Analysis

Introduction to Cloudera-/Hortonworks/Greenplum

Module 25

Use Multinode cluster setup

High Availabilty-Hadoop data federation

Commissioning and-decommissioning

Automatic and manual failover

Zookeeper failover controller

Use cases, Case studies and Proof of Concept

Working on different Distributions

Module 26 (Optional)

Cloudera and Horton works Certification Questions Discussion

