Big Data Hadoop Developer Training
(Hadoop ,Spark , NoSQL , Cloud)
Training in Chennai
Module 1
- Introduction to Big Data
- Characteristics
- Why, How and What s of Big data
- Existing OLTP, ETL,DWH,OLAP
Module 2
- Introduction to Hadoop Ecosystem
- Architecture-HDFS
- Sharding , Distributed and Replication factor (SDR)
- Daemons
- Map reduce (MRV1) and Yarn
- Hadoop v1 and v2
- Hadoop Data fedaration
Module 3
- Prerequisite for Installation
- Single node , Pseudo distributed and Multinode cluster
- Virtual machine using Linux ubuntu/CentOS
- Installation of hadoop in cloud (Azure/AWS)
- Installation of Java ,ssh,eclipse
- Installation and configuration of Hadoop,HDFS,Daemons,YARN Daemons
- High Availability (Active and Standby)
- Automatic and manual failover
- Hadoop Fs shell commands
- Writing Data to HDFS
- Reading Data from DFS
Module 4
- Rack awareness policy and Replica placement Strategy
- Failure Handling
- Namenode
- Datanode
- Block-Safe mode
- Rebalancing and load optimization
- Trouble shooting and error rectification
- Hadoop fs shell commands-Unix and Java-Basics
- Assessment 1
Module 5
- Introduction to Mapreduce
- Architecture of Map reduce
- Execution Map reduce in YARN
- App Master ,Resource Manager and Node manager
- Input format , Input split and Key Value Pairs
- class and methods of Mapreduce paradigm
- Mapper
- Reducer
- Partitioner
- Custom and Default partition
- Shuffle and Sort
- Combiner-Scheduler
- App Master /manager
- Container-Node manager
Module 6
- Map reduce Hands on
- word count program/ log analytics
- Hadoop streaming in R/Python
- Data processing Transformations
- Map only jobs and uber jobs
- Inverted index and searches
Module 7
- MR Programs 2
- Structured and Unstructured Data handling
- optimizing using Combiner
- Partitioner
- Single and multiple column
- Inverted Index
- XML -semi structure
- Map side joins
- Reduce side join
Module 8
- Introduction to Hive Data warehouse
- Installation hive and metastore database
- Configure metastore to mysql
- Hive QL Commands
Module 9
Manipulation and anlytical function in hive
Managed table and external tables
Partitioning and Bucketing
Complex data types and Unstructured data
Advance HQL commands
UDF and UDAF
Integration with Hbase
SerDe / Regular Expression
File formats
JSON to AVRO file conversion
Parquet compressed file to uncompressed
AVRO schema and data file
ORC file
Assessment 2
Module 10
- Introduction to PIG
- Installation-Bags and collections
- Commands and Scripts
- Pig UDF
Module 11
- Introduction to NOSQL
- ACID /CAP/BASE
- Key value pair
- Map reduce
- Column family
- Hbase Documennt
- MongoDB
- Graph DB
- Neo4j
Module 12
- Introduction to HBASE and installation
- The HBase Data Model
- The HBase Shell
- HBase Architecture
- Schema Design
- The HBase API
- HBase Configuration and Tuning
Module 13
- Ingest data from RDB
- Introduction to Sqoop and installation
- Import and export data from and to RDB
- Bulk loading , Incremental load , Split by , Conditional query
- Sqoop validation and jobs
Module 14
Ingest streaming data
Flume Architecture
Agent ,Source,sink channel
Ingest log file
Collecting data from twitter for Sentimental analysis
Assessment 3
Module 15
- Integrate With ETL
- Talend Big data edition – Components of big data
Module 16
- Big data Analytics
- Dimensional modelling
- Data Visualization
- Tableau – Hive and spark sql connectors
Module 17
- Spark core and Components
- Spark Shell
- Create RDD from HDFS /Local
- Creating new RDD-Transformations on RDD
- Lineage Graph – DAG
- Actions on RDD
- RDD Concepts on Persist and Cache-Lazy evaluation of RDD
- Hands on and core concepts of map() transformation
- Hands on and core concepts of filter() transformation
- Hands on and core concepts of flatMap() transformation Compare map and flatMap transformation Hands on and core concepts of reduce() action
- Hands on and core concepts of fold() action-Hands on and core concepts of aggregate() action
- Basics of Accumulator
- Hands on and core concepts of collect() action
- Hands on and core concepts of take() action
- Apache Spark Execution Model
- How Spark execute program
- Concepts of RDD partitioning
- RDD data shuffling and performance issue
Module 18
- Data frames and dataset
- Spark SQL
- Pyspark
Module 19
- Spark jobs
- Build scala program using SBT /Maven
- Spark submit and spark Application
Module 20
- KAFKA-Publisher /Subscriber
- Consumer and producer
Module 21
- HUE
- Monitoring and scheduling
Module 22
- Zeppelin
- OOZIE-Workflow and Co-ordinator
Module 23
- Distribution Installation on cloud or Sandbox
- Cloudera -cloudera manager
- Horton works -ambari server
- MapR – MCS
Module 24
- Introduction to Data science-Machine learning-Statistical Analysis-Sentiment Analysis
Module 25
Use Multinode cluster setup-High Availabilty-Hadoop data federation-Commissioning and-decommissioning-Automatic and manual failover-Zookeeper failover controller
Module 26
- Use cases, Case studies and Proof of Concept-Working on different Distributions
Module 27
- CCA Spark and Hadoop Developer Exam (CCA175)
- CCP Data Engineer (DE575)
- HDPCD CERTIFICATION
- HDP CERTIFIED APACHE SPARK DEVELOPER