Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy.

From Wikipedia

Apache Hadoop is 100% open source, and pioneered a fundamentally new way of storing and processing data. Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big.

From Cloudera


Apache Spark™ is a fast and general engine for large-scale data processing.

Day 1:Bigdata Landscape

Why Bigdata-3 v s-Hadoop Ecosystem.

Introduction to Apache Spark.

Features of Apache Spark.

Apache Spark Stack.

Introduction to RDD’s.

RDD’s Transformation.

What is good and bad In MapReduce?.

Why to use Apache Spark.

Day 2: Installation

Single node ,Pseudo-distribution and Multinode Cluster.

Include Hadoop.

Include Apache Spark.

Include Hive.

Include Sqoop.

Include Hue.

Day 3: Deep Dive in HDFS

HDFS Design.

Fundamental of HDFS (Blocks, NameNode, DataNode, Secondary Name Node).

Rack Awareness.

Read/Write from HDFS.

HDFS Federation and High Availability (Hadoop 2.x.x).

HDFS Command Line Interface.

Day 4: Spark Shell Hands On Using HDFS

Spark Shell Introduction.

Create file using Hue-Spark Shell extracting file from HDFS.

Create RDD from HDFS file.

Day 5: Programming with RDD Part-1

Creating new RDD.

Transformations on RDD.

Lineage Graph.

Actions on RDD.

RDD Concepts on Persist and Cache.

Lazy evaluation of RDD .

Day 6: Scala/Spark Functional Programming

Using Function Literals.

Anonymous Functions.

Define a function which accepts another function.

Day 7: RDD Transformation Programming in Depth

Hands on and core concepts of map() transformation.

Hands on and core concepts of filter() transformation.

Hands on and core concepts of flatMap() transformation.

Compare map and flatMap transformation.

Day 8: Apache Spark in Action

Hands on and core concepts of reduce() action.

Hands on and core concepts of fold() action.

Hands on and core concepts of aggregate() action.

Basics of Accumulator-Hands on and core concepts of collect() action.

Hands on and core concepts of take() action.

Ordered access of RDD.

Day 9: Apache Spark Execution Model

How Spark execute program.

Concepts of RDD partitioning.

RDD data shuffling and performance issue.

Day 10: Apache Spark PairRDD

Core concepts of PairRDD.

Creation of PairRDD.

Aggregation in PairRDD.

Aggregation functions understanding in depth.

How reduceByKey() work conceptually?

How foldByKey() work conceptually?

How combineByKey()work conceptually?

Day 11: Spark PairRDD HandsOn Lab





Day 12 : Spark PairRDD Joining, Zipping and

reduceByKey versus groupByKey performance issue.



joining (left, right, inner etc.)

Day 13-A: Understanding Hadoop SequenceFile

Day 13-B: Creating Seqnce File and Processing using SPark

Creating SequenceFile using TSV file.

Loading Data in Apache Hive.

Processing SequnceFile as an RDD.

Day 14 : Spark Shared Variables

Shared Variables: Broadcast Variables-Shared Variables: Accumulators.

Day 15 : Spark Accumulator

Word count and Character Count.

Counting Bad records in a file.

Day 16 : Spark BroadCast Variable

Joining two csv files one as a Broadcasted Lookup table.

Day 17 : Spark API

BroadCast Variable, Filter Functions and Saving File

Day 18 : Spark API

Spark Join, GroupBy and Swap function

Day 19 : Spark API

Remove Header from CSV file and Map Each column to Row Data

Day 20 : Spark SQL


Schema RDD replaced by DataFrame API.

History of SparkSQL.

Catalyst Optimizer.

Day 21 : SparkSQL HandsOn Sessions

Hive Configuration.

Create Hive table using Spark.

Load Data in HIve table using Spark.

Create another table using DataFrame.

Day 22 : Implementing Business Logic using SparkSQL

Loading CSV file.

Spark Case classes (To create schema for csv file).

Convert RDD to DataFrame using DataFrmae API for query data.

Using SQL query on DataFrame.

Day 23 : Spark Loading and Saving Your Data


CSV and TSV files.

JSON Files.

Day 24 : Spark Loading and Saving Your Data SQL and NOSQL


HBase (NoSQL).

Day 25 : Writing Spark Applications

Spark Applications vs. Spark Shell.

Creating the SparkContext.

Configuring Spark Properties.

Building and Running a Spark Application.


Day 26 : Spark Streaming in Depth Part-1

Spark Streaming Overview-Example: Streaming Word Count.

Day 27 : Spark Streaming in Depth Part-2

Other Streaming Operations.

Sliding Window Operation.

Developing Spark Streaming Applications.

Day 28 : Spark Algorithms Part-1

Iterative Algorithm.

Graph Analysis.

Machine Learning.

Day 29 : Case studies

Day 1

Introduction to Big Data.


Why, How and What s of Big data.


Day 2

Introduction to Hadoop Ecosystem.


Map reduce (MRV1).

Hadoop v1 and v2 Hadoop Data fedaration.


Pre Requisite for Installation.

VM Linux ubuntu/CentOS JDK,ssh,eclipse.

Installation and configuration of Hadoop,HDFS,Daemons,YARN Daemons.

High Availability.

Automatic and manual failover.

Writing Data to HDFS.

Reading Data from DFS.

Day 4

Replica placement Strategy.

Failure Handling.

Namenode .


Block-Safe mode.

Rebalancing and load optimization.

Trouble shooting and error rectification.

Hadoop fs shell commands-Unix and Java-Basics.

Day 5

Introduction to Mapreduce.

Architecture of Map reduce.

Execution Map reduce in YARN.

App Master ,Resource Manager and Node manager-Inputformat and Key Value Pairs.




Custom and Default.

Shuffle and Sort.


App Master /manager.

Container-Node manager.

Day 6

Map reduce Hands on.

word count program/ log analytics.

Hadoop streaming in R and Python.

Data processing Transformations.

Map only jobs and uber jobs.

Inverted index and searches.

Day 7

MR Programs 2:

Structured and Unstructured Data handling.



Single and multiple column.

Inverted Index.

XML -semi structure.

Map side joins.

Reduce side join.

Day 8

Introduction to HIVE Datawarehouse.


Configure metastore to mysql- Hive QL Commands.

Day 9

Manipulation and anlytical function in hive.

Managed table and external tables.

Partitioning and Bucketing.

Complex data types and Unstructured data.

Advance HQL commands.


Integration with Hbase.

SerDe / Regular Expression.

Day 10

Introduction to PIG.

Installation-Bags and collections

Commands and Scripts.

Pig UDF.

Day 11

Introduction to NOSQL.


Key value pair.

Map reduce.

Column family.

Hbase- Documen.MongoDB,

Graph DB.


Day 12

Introduction to HBASE and installation.

The HBase Data Model.

The HBase Shell.

HBase Architecture.

Schema Design.

The HBase API.

HBase Configuration and Tuning.

Day 13

Introduction to Sqoop and installation.

Bulk loading.

Hadoop Streaming.

Day 14

Flume Architecture.

Agent ,Source,sink channel.

Ingest log file.

Collecting data from twitter for Sentimental analysis.

Day 15

Integrate With ETL-Talend open Data studio.


Day 16

Big data Analytics.

Visualization Dimensional modelling Tableau.

Day 17


Spark Shell Hands On Using HDFS.

Create RDD from HDFS file.

Creating new RDD-Transformations on RDD.

Lineage Graph.

Actions on RDD.

RDD Concepts on Persist and Cache-Lazy evaluation of RDD.

Hands on and core concepts of map() transformation.

Hands on and core concepts of filter() transformation.

Hands on and core concepts of flatMap() transformation Compare map and flatMap transformation Hands on and core concepts of reduce() action.

Hands on and core concepts of fold() action-Hands on and core concepts of aggregate() action.

Basics of Accumulator.

Hands on and core concepts of collect() action.

Hands on and core concepts of take() action.

Apache Spark Execution Model.

How Spark execute program.

Concepts of RDD partitioning.

RDD data shuffling and performance issue.

Day 18

Spark SQL

Day 19

Spark submit and spark Application.


KAFKA-Publisher /Subcrriber

Consumer and producer.

Day 22

Cloudera manager and VM-HUE.

Day 23

OOZIE-Workflow and Co-ordinator.

Day 24

Introduction to Data science-Machine learning-Statistical Analysis-Sentiment Analysis-Cloudera-/Hortonworks/Greenplum.

Day 25

Use Multinode cluster setup-High Availabilty-Hadoop data federation-Commissioning and-decommissioning-Automatic and manual failover-Zookeeper failover controller-Use cases, Case studies and Proof of Concept-Working on different Distributions

Day 26:(Optional)

Cloudera and Horton works Certification Questions Discussion.

What is Big Data ?

Big Data Facts

The Three V’s of Big Data

Understanding Hadoop

What is Hadoop ?,Why learn Hadoop ?

Relational Databases Vs. Hadoop

Motivation for Hadoop

6 Key Hadoop Data Types

The Hadoop Distributed File system (HDFS)

What is HDFS ?

HDFS components

Understanding Block storage

The Name Node

The Data Nodes

Data Node Failures

HDFS Commands

HDFS File Permissions

The MapReduce Framework

Overview of MapReduce

Understanding MapReduce

The Map Phase

The Reduce Phase

WordCount in MapReduce

Running MapReduce Job

Planning Your Hadoop Cluster

Single Node Cluster Configuration

Multi-Node Cluster Configuration

Checking HDFS Status

Breaking the cluster

Copying Data Between Clusters

Adding and Removing Cluster Nodes

Rebalancing the cluster

Name Node Metadata Backup

Cluster Upgrading

Installing and Managing Hadoop Ecosystem Projects







Managing and Scheduling Jobs

Managing Jobs

The FIFO Scheduler

The Fair Schedule

How to stop and start jobs running on the cluster

Cluster Monitoring, Troubleshooting, and Optimizing

General System conditions to Monitor

Name Node and Job Tracker Web Uis

View and Manage Hadoop’s Log files

Ganglia Monitoring Tool

Common cluster issues and their resolutions

Benchmark your cluster’s performance

Populating HDFS from External Sources

How to use Sqoop to import data from RDBMSs to HDFS

How to gather logs from multiple systems using Flume

Features of Hive, Hbase and Pig

How to populate HDFS from external Sources

Best bigdata training center in chennai,best hadoop training centre in chennai,best big data training in chennai,best training institute in chennai for big data,big data analytics training center in chennai,big data architect training in chennai,big data certification cost chennai,hadoop architect training in chennai,best bigdata corporate training for singapore , Australia , US ,big data classroom training in chennai,big data testing training in chennai,big data hadoop certification training and placement in chennai,big data cloudera training in chennai,big data mapr training in chennai,big data hortonworks training in chennai,big data hadoop training in chennai ekkaduthangal,big data hadoop training institutes in chennai,big data testing training in chennai,big data training and placement in chennai,big data corporate training center chennai,big data hadoop corporate training chennai ,big data workshop for students in chennai,big data training fees in chennai,free big data training in chennai,big data microsoft hdinsight training in chennai ekkaduthangal,big data training in chennai review,big data training in chennai tambaram,big data training in chennai velachery,big data training in chennai with placement,big data training institute chennai
big data training ekkaduthangal chennai,cost of big data training in chennai,hadoop big data training cost in chennai,ibm big insight big data training in chennai,ekkaduthangal big data training in chennai,training for big data in chennai,training on big data in chennai,Apache spark training,cloudera certification training ,data science bigdata training,data science using python ,statistics training in chennai,bigdata spark training in chennai,cloudera spark hadoop certification training,Hortonworks developer and admin training,Azure big data lake training,cloudera hadoop installation in azure ,Hortonworks hadoop installation in azure ,Mapr hadoop installation in azure,Mapr hadoop installation in AWS,Talend bigdata training in chennai,cassandra solr training in chennai,big data nosql training in ekkaduthangal,best big data machine learning training in chennai,best big data deep learning training in chennai,best big data online training in chennai with 100 % placement assistance,Bigdata job for fresher,RPA training in chennai,Mapr cluster installation and certification training in chennai,Informatica big data training in chennai,hadoop spark nosql cloud training in chennai ,spark scala python programming training in chennai,Tensorflow training in chennai,pyspark training,hadoop job,bigdata job oriented training
How to participate in kaggle and hackathon , bigdata free workshop in chennai, Bigdata workshop with certificate in chennai,Bigdata journal preparation