Big Data Hadoop Developer Training
Training in Chennai
Module 1
Characteristics
Why, How and What s of Big data
Existing OLTP, ETL,DWH,OLAP
Module 2
Introduction to Hadoop Ecosystem
Architecture-HDFS
Map reduce (MRV1)
Hadoop v1 and v2 Hadoop Data fedaration
Module 3
Pre Requisite for Installation
VM Linux ubuntu/CentOS JDK,ssh,eclipse
Installation and configuration of Hadoop,HDFS,Daemons,YARN Daemons
High Availability
Automatic and manual failover
Writing Data to HDFS
Reading Data from DFS
Module 4
Replica placement Strategy
Failure Handling
Namenode
Datanode
Block-Safe mode
Rebalancing and load optimization
Trouble shooting and error rectification
Hadoop fs shell commands-Unix and Java-Basics
Module 5
Introduction to Mapreduce
Architecture of Map reduce
Execution Map reduce in YARN
App Master ,Resource Manager and Node manager-Inputformat and Key Value Pairs
Mapper
Reducer
Partitioner
Custom and Default
Shuffle and Sort
Combiner-Scheduler
App Master /manager
Container-Node manager
Module 6
Map reduce Hands on
word count program/ log analytics
Hadoop streaming in R and Python
Data processing Transformations
Map only jobs and uber jobs
Inverted index and searches
Module 7
MR Programs 2
Structured and Unstructured Data handling
Combiner
Partitioner
Single and multiple column
Inverted Index
XML -semi structure
Map side joins
Reduce side join
Module 8
Introduction to HIVE Datawarehouse
Installation
Configure metastore to mysql- Hive QL Commands
Module 9
Manipulation and anlytical function in hive
Managed table and external tables
Partitioning and Bucketing
Complex data types and Unstructured data
Advance HQL commands
UDF and UDAF
Integration with Hbase
SerDe / Regular Expression
Module 10
Introduction to PIG
Installation-Bags and collections
Commands and Scripts
Pig UDF
Module 11
Introduction to NOSQL
ACID /CAP/BASE
Key value pair
Map reduce
Column family
Hbase Documennt
Graph DB
Neo4j
Module 12
Introduction to HBASE and installation
The HBase Data Model
The HBase Shell
HBase Architecture
Schema Design
The HBase API
HBase Configuration and Tuning
Module 13
Introduction to Sqoop and installation
Bulk loading
Hadoop Streaming
Module 14
Flume Architecture
Agent ,Source,sink channel
Ingest log file
Collecting data from twitter for Sentimental analysis
Module 15
Integrate With ETL-Talend open Data studio
BD
Module 16
Big data Analytics
Visualization Dimensional modelling Tableau
Module 17
Spark
Spark Shell Hands On Using HDFS
Create RDD from HDFS file
Creating new RDD-Transformations on RDD
Lineage Graph
Actions on RDD
RDD Concepts on Persist and Cache-Lazy evaluation of RDD
Hands on and core concepts of map() transformation
Hands on and core concepts of filter() transformation
Hands on and core concepts of flatMap() transformation Compare map and flatMap transformation
Hands on and core concepts of reduce() action
Hands on and core concepts of fold() action-Hands on and core concepts of aggregate() action
Basics of Accumulator
Hands on and core concepts of collect() action
Hands on and core concepts of take() action
Apache Spark Execution Model
How Spark execute program
Concepts of RDD partitioning
RDD data shuffling and performance issue
Module 18
Spark SQL
Module 19
Spark submit and spark Application
20
KAFKA-Publisher /Subcrriber
Consumer and producer
Module 22
Cloudera manager and VM-HUE
Module 23
OOZIE-Workflow and Co-ordinator
Module 24
Introduction to Machine learning
Introduction to Statistical Analysis
Introduction to Sentiment Analysis
Introduction to Cloudera-/Hortonworks/Greenplum
Module 25
Use Multinode cluster setup
High Availabilty-Hadoop data federation
Commissioning and-decommissioning
Automatic and manual failover
Zookeeper failover controller
Use cases, Case studies and Proof of Concept
Working on different Distributions
Module 26 (Optional)
Cloudera and Horton works Certification Questions Discussion