Big Data Spark NoSql Cloud Training

Training in Chennai

Module 1

  • Introduction to Big Data
  • Characteristics
  • Why, How and Whats of Big data
  • Existing OLTP, ETL,DWH,OLAP

Module 2

  • Introduction to Hadoop Ecosystem Architecture-HDFS
  • Sharding , Distributed and Replication factor (SDR)
  • Daemons
  • Map reduce (MRV1) and Yarn
  • Hadoop v1 and v2
  • Hadoop Data federation

Module 3

  • Prerequisite for Installation
  • Single node , Pseudo distributed and Multinode cluster
  • Virtual machine using Linux ubuntu/CentOS
  • Installation and configuration of Hadoop, HDFS, Daemons, YARN Daemons
  • High Availability (Active and Standby)
  • Automatic and manual failover
  • Hadoop Fs shell commands
  • Writing Data to HDFS
  • Reading Data from DFS

Module 4

  • Rack awareness policy and Replica Placement Strategy
  • Failure Handling
  • Namenode
  • Datanode
  • Block-Safe mode
  • Rebalancing and load optimization
  • Trouble shooting and error rectification
  • Hadoop fs shell command

Module 5

  • Introduction to Map reduce
  • Architecture of Map reduce
  • Execution Map reduce in YARN
  • App Master, Resource Manager and Node manager
  • Input format, Input split and Key Value Pairs
  • Class and methods of Map reduce paradigm
  • Mapper
  • Reducer
  • Partitioner
  • Custom and Default partition
  • Shuffle and Sort
  • Combiner-Scheduler
  • App Master /manager
  • Container-Node manager

Module 6

  • Map reduce Hands on word count program/ log analytics
  • Hadoop streaming in R/Python
  • Data processing Transformations
  • Map only jobs and Uber jobs
  • Inverted index and searches

Module 7

  • Structured and Unstructured Data handling optimizing using Combiner/Partitioner
  • Custom partition and default partition

Module 8

  • Introduction to Hive Data Warehouse
  • Installation hive and metastore database
  • Configure metastore to MySQL
  • Creation of hive table
  • Different ways of loading data to hive
  • Hive QL Commands
  • Data transformations: joins, filter and others

Module 9

  • Manipulation and analytical function in hive
  • Managed table and external tables
  • Partitioning and Bucketing
  • Complex data types and unstructured data
  • Advance HQL commands
  • UDF and UDAF
  • Integration with Hbase

Module 10

  • SerDe / Regular Expression
  • File formats
  • JSON, AVRO file conversion
  • Parquet compressed file to uncompressed
  • AVRO schema and data file
  • ORC file

Module 11

  • Ingest data from RDB
  • Introduction to Sqoop and installation
  • Import and export data from and to RDB
  • Bulk loading , Incremental load , Split by , Conditional query
  • Sqoop validation and sqoop jobs
  • Data ingestion into hive
  • Data ingestion to Hbase
  • Different file formats

Module 12

  • Ingest streaming data
  • Flume Architecture
  • Agent, Source, sink channel
  • Ingest log file
  • Collecting data from twitter for Sentimental analysis

Module 13

  • Spark core and Components
  • Spark Shell
  • Create RDD from HDFS /Local
  • Creating new RDD-Transformations on RDD
  • Lineage Graph – DAG
  • Actions on RDD
  • Different resource management
  • Spark-shell Scala REPL
  • PySpark
  • Monitoring jobs

Module 14

  • Scala/Spark Functional Programming
  • Using Function Literals
  • Anonymous Functions
  • Define a function which accepts another function
  • Spark Loading and Saving Your Data
  • Text Files
  • CSV and TSV files
  • JSON Files
  • Spark jobs
  • Build Scala program using SBT /Maven
  • Spark submit and spark Application

Module 15

  • RDD Transformation Programming in Depth
  • Hands on and core concepts of map() transformation
  • Hands on and core concepts of filter() transformation
  • Hands on and core concepts of flatMap() transformation
  • Compare Map and Flat Map transformation
  • Apache Spark in Action
  • Hands on and core concepts of reduce() action
  • Hands on and core concepts of fold() action
  • Hands on and core concepts of aggregate() action
  • Basics of Accumulator-Hands on and core concepts of collect() action
  • Hands on and core concepts of take() action
  • Ordered access of RDD

Module 16

  • Creating Dataframe
  • Data Frames & Datasets
  • Creating Dataframe
  • Interoperating with RDDs
  • JSON and Parquet File Formats
  • Loading Data through Different Source
  • RDD to DF and DF.RDD
  • Dataframe operations(Dataset)

Module 17

  • Need for Spark SQL
  • What is Spark SQL?
  • Spark SQL Architecture
  • SQL Context in Spark SQL

Module 18

  • Spark Streaming Overview
  • Streaming data collections from different sources
  • Other Streaming Operations
  • Sliding Window Operation
  • Developing Spark Streaming Applications
  • Kafka integration

Module 19

  • Introduction to NOSQLACID vs CAP theorem/BASE
  • Schema design
  • Introduction to HBASE and installation
  • The HBase Data Model
  • The HBase Shell
  • HBase Architecture
  • Schema Design

Module 20

  • The HBase APIH
  • Base Configuration and Tuning
  • Hive and HBase integration
  • Loading data using sqoop
  • Time to live
  • Compactions
  • Tombstone

Module 21

  • Hue web interface
  • HIVE,PIG editors
  • Oozie scheduler
  • Coordinator
  • Dashboard
  • Configuration files and monitoring

Module 22

  • Kafka
  • Producer ,consumer and topics
  • Flume with Kafka
  • Kafka topic with spark streaming

Module 23

  • Hadoop distribution
  • Cloudera components
  • Horton works components
  • Security
  • Monitoring
  • Dashboard

Module 24

  • Zeppelin notebook
  • Ambari
  • Cloudera manager

Module 25

  • AWS and Azure in BigdataS3 or Azure Blob storage components and usage
  • Module 26
  • Talend BigData edition
  • ETL Tool integration
  • Data analytics using tableau
  • Connecting with Hadoop Hive server
  • Interactive visualization

Module 26

  • Cloudera spark Hadoop Developer certification
  • Horton works certification
  • Guidance and mock

Module 27

  • Introduction to machine learning
  • Applying machine learning algorithm in Hadoop and spark MLlib
  • Classification and clustering

Module 28

  • Case study 1: Sqoop, Hbase, Hive, spark , tableau

Module 29

  • Case study 2: Kafka, spark streaming and HBase
Developed by God Particles
Back to Top