Big Data Hadoop Developer Training
(Hadoop ,Spark , NoSQL , Cloud)

Training in Chennai

Module 1

  • Introduction to Big Data
  • Characteristics
  • Why, How and What s of Big data
  • Existing OLTP, ETL,DWH,OLAP

Module 2

  • Introduction to Hadoop Ecosystem
  • Architecture-HDFS
  • Sharding , Distributed and Replication factor  (SDR)
  • Daemons
  • Map reduce (MRV1) and Yarn
  • Hadoop v1 and v2
  • Hadoop Data fedaration

Module 3

  • Prerequisite for Installation
  • Single node , Pseudo distributed and Multinode cluster
  • Virtual machine using Linux ubuntu/CentOS
  • Installation of hadoop in cloud (Azure/AWS)
  • Installation of Java ,ssh,eclipse
  • Installation and configuration of Hadoop,HDFS,Daemons,YARN Daemons
  • High Availability (Active and Standby)
  • Automatic and manual failover
  • Hadoop Fs shell commands
  • Writing Data to HDFS
  • Reading Data from DFS

Module 4

  • Rack awareness policy and Replica placement Strategy
  • Failure Handling
  • Namenode
  • Datanode
  • Block-Safe mode
  • Rebalancing and load optimization
  • Trouble shooting and error rectification
  • Hadoop fs shell commands-Unix and Java-Basics
  • Assessment 1

Module 5

  • Introduction to Mapreduce
  • Architecture of Map reduce
  • Execution Map reduce in YARN
  • App Master ,Resource Manager and Node manager
  • Input format , Input split and Key Value Pairs
  • class and  methods of Mapreduce paradigm
  • Mapper
  • Reducer
  • Partitioner
  • Custom and Default partition
  • Shuffle and Sort
  • Combiner-Scheduler
  • App Master /manager
  • Container-Node manager

Module 6

  • Map reduce Hands on
  • word count program/ log analytics
  • Hadoop streaming in R/Python
  • Data processing Transformations
  • Map only jobs and uber jobs
  • Inverted index and searches

Module 7

  • MR Programs 2
  • Structured and Unstructured Data handling
  • optimizing using Combiner
  • Partitioner
  • Single and multiple column
  • Inverted Index
  • XML -semi structure
  • Map side joins
  • Reduce side join

Module 8

  • Introduction to Hive Data warehouse
  • Installation hive and metastore database
  • Configure metastore to mysql
  • Hive QL Commands

Module 9

Manipulation and anlytical function in hive

Managed table and external tables

Partitioning and Bucketing

Complex data types and Unstructured data

Advance HQL commands

UDF and UDAF

Integration with Hbase

SerDe / Regular Expression

File formats

JSON to AVRO file conversion

Parquet compressed file to uncompressed

AVRO schema and data file

ORC file

Assessment 2

Module 10

  • Introduction to PIG
  • Installation-Bags and collections
  • Commands and Scripts
  • Pig UDF

Module 11

  • Introduction to NOSQL
  • ACID /CAP/BASE
  • Key value pair
  • Map reduce
  • Column family
  • Hbase Documennt
  • MongoDB
  • Graph DB
  • Neo4j

Module 12

  • Introduction to HBASE and installation
  • The HBase Data Model
  • The HBase Shell
  • HBase Architecture
  • Schema Design
  • The HBase API
  • HBase Configuration and Tuning

Module 13

  • Ingest data from RDB
  • Introduction to Sqoop and installation
  • Import and export data from and to RDB
  • Bulk loading , Incremental load , Split by , Conditional query
  • Sqoop validation and jobs

Module 14

Ingest streaming data

Flume Architecture

Agent ,Source,sink channel

Ingest log file

Collecting data from twitter for Sentimental analysis

Assessment 3

Module 15

  • Integrate With ETL
  • Talend Big data edition – Components of big data

Module 16

  • Big data Analytics
  • Dimensional modelling
  • Data Visualization
  • Tableau – Hive and spark sql connectors

Module 17

  • Spark core and Components
  • Spark Shell
  • Create RDD from HDFS /Local
  • Creating new RDD-Transformations on RDD
  • Lineage Graph – DAG
  • Actions on RDD
  • RDD Concepts on Persist and Cache-Lazy evaluation of RDD
  • Hands on and core concepts of map() transformation
  • Hands on and core concepts of filter() transformation
  • Hands on and core concepts of flatMap() transformation Compare map and flatMap transformation Hands on and core concepts of reduce() action
  • Hands on and core concepts of fold() action-Hands on and core concepts of aggregate() action
  • Basics of Accumulator
  • Hands on and core concepts of collect() action
  • Hands on and core concepts of take() action
  • Apache Spark Execution Model
  • How Spark execute program
  • Concepts of RDD partitioning
  • RDD data shuffling and performance issue

Module 18

  • Data frames and dataset
  • Spark SQL
  • Pyspark

Module 19

  • Spark jobs
  • Build scala program using SBT /Maven
  • Spark submit and spark Application

Module 20

  • KAFKA-Publisher /Subscriber
  • Consumer and producer

Module 21

  • HUE
  • Monitoring and scheduling

Module 22

  • Zeppelin
  • OOZIE-Workflow and Co-ordinator

Module 23

  • Distribution Installation on cloud  or Sandbox
  • Cloudera -cloudera manager
  • Horton works -ambari server
  • MapR – MCS

Module 24

  • Introduction to Data science-Machine learning-Statistical Analysis-Sentiment Analysis

Module 25

Use Multinode cluster setup-High Availabilty-Hadoop data federation-Commissioning and-decommissioning-Automatic and manual failover-Zookeeper failover controller

Module 26

  • Use cases, Case studies and Proof of Concept-Working on different Distributions

Module 27

  • CCA Spark and Hadoop Developer Exam (CCA175)
  • CCP Data Engineer (DE575)
  • HDPCD CERTIFICATION
  • HDP CERTIFIED APACHE SPARK DEVELOPER
Developed by God Particles
Back to Top