Big Data Spark NoSql Cloud Training
Training in Chennai
Module 1
- Introduction to Big Data
- Characteristics
- Why, How and Whats of Big data
- Existing OLTP, ETL,DWH,OLAP
Module 2
- Introduction to Hadoop Ecosystem Architecture-HDFS
- Sharding , Distributed and Replication factor (SDR)
- Daemons
- Map reduce (MRV1) and Yarn
- Hadoop v1 and v2
- Hadoop Data federation
Module 3
- Prerequisite for Installation
- Single node , Pseudo distributed and Multinode cluster
- Virtual machine using Linux ubuntu/CentOS
- Installation and configuration of Hadoop, HDFS, Daemons, YARN Daemons
- High Availability (Active and Standby)
- Automatic and manual failover
- Hadoop Fs shell commands
- Writing Data to HDFS
- Reading Data from DFS
Module 4
- Rack awareness policy and Replica Placement Strategy
- Failure Handling
- Namenode
- Datanode
- Block-Safe mode
- Rebalancing and load optimization
- Trouble shooting and error rectification
- Hadoop fs shell command
Module 5
- Introduction to Map reduce
- Architecture of Map reduce
- Execution Map reduce in YARN
- App Master, Resource Manager and Node manager
- Input format, Input split and Key Value Pairs
- Class and methods of Map reduce paradigm
- Mapper
- Reducer
- Partitioner
- Custom and Default partition
- Shuffle and Sort
- Combiner-Scheduler
- App Master /manager
- Container-Node manager
Module 6
- Map reduce Hands on word count program/ log analytics
- Hadoop streaming in R/Python
- Data processing Transformations
- Map only jobs and Uber jobs
- Inverted index and searches
Module 7
- Structured and Unstructured Data handling optimizing using Combiner/Partitioner
- Custom partition and default partition
Module 8
- Introduction to Hive Data Warehouse
- Installation hive and metastore database
- Configure metastore to MySQL
- Creation of hive table
- Different ways of loading data to hive
- Hive QL Commands
- Data transformations: joins, filter and others
Module 9
- Manipulation and analytical function in hive
- Managed table and external tables
- Partitioning and Bucketing
- Complex data types and unstructured data
- Advance HQL commands
- UDF and UDAF
- Integration with Hbase
Module 10
- SerDe / Regular Expression
- File formats
- JSON, AVRO file conversion
- Parquet compressed file to uncompressed
- AVRO schema and data file
- ORC file
Module 11
- Ingest data from RDB
- Introduction to Sqoop and installation
- Import and export data from and to RDB
- Bulk loading , Incremental load , Split by , Conditional query
- Sqoop validation and sqoop jobs
- Data ingestion into hive
- Data ingestion to Hbase
- Different file formats
Module 12
- Ingest streaming data
- Flume Architecture
- Agent, Source, sink channel
- Ingest log file
- Collecting data from twitter for Sentimental analysis
Module 13
- Spark core and Components
- Spark Shell
- Create RDD from HDFS /Local
- Creating new RDD-Transformations on RDD
- Lineage Graph – DAG
- Actions on RDD
- Different resource management
- Spark-shell Scala REPL
- PySpark
- Monitoring jobs
Module 14
- Scala/Spark Functional Programming
- Using Function Literals
- Anonymous Functions
- Define a function which accepts another function
- Spark Loading and Saving Your Data
- Text Files
- CSV and TSV files
- JSON Files
- Spark jobs
- Build Scala program using SBT /Maven
- Spark submit and spark Application
Module 15
- RDD Transformation Programming in Depth
- Hands on and core concepts of map() transformation
- Hands on and core concepts of filter() transformation
- Hands on and core concepts of flatMap() transformation
- Compare Map and Flat Map transformation
- Apache Spark in Action
- Hands on and core concepts of reduce() action
- Hands on and core concepts of fold() action
- Hands on and core concepts of aggregate() action
- Basics of Accumulator-Hands on and core concepts of collect() action
- Hands on and core concepts of take() action
- Ordered access of RDD
Module 16
- Creating Dataframe
- Data Frames & Datasets
- Creating Dataframe
- Interoperating with RDDs
- JSON and Parquet File Formats
- Loading Data through Different Source
- RDD to DF and DF.RDD
- Dataframe operations(Dataset)
Module 17
- Need for Spark SQL
- What is Spark SQL?
- Spark SQL Architecture
- SQL Context in Spark SQL
Module 18
- Spark Streaming Overview
- Streaming data collections from different sources
- Other Streaming Operations
- Sliding Window Operation
- Developing Spark Streaming Applications
- Kafka integration
Module 19
- Introduction to NOSQLACID vs CAP theorem/BASE
- Schema design
- Introduction to HBASE and installation
- The HBase Data Model
- The HBase Shell
- HBase Architecture
- Schema Design
Module 20
- The HBase APIH
- Base Configuration and Tuning
- Hive and HBase integration
- Loading data using sqoop
- Time to live
- Compactions
- Tombstone
Module 21
- Hue web interface
- HIVE,PIG editors
- Oozie scheduler
- Coordinator
- Dashboard
- Configuration files and monitoring
Module 22
- Kafka
- Producer ,consumer and topics
- Flume with Kafka
- Kafka topic with spark streaming
Module 23
- Hadoop distribution
- Cloudera components
- Horton works components
- Security
- Monitoring
- Dashboard
Module 24
- Zeppelin notebook
- Ambari
- Cloudera manager
Module 25
- AWS and Azure in BigdataS3 or Azure Blob storage components and usage
- Module 26
- Talend BigData edition
- ETL Tool integration
- Data analytics using tableau
- Connecting with Hadoop Hive server
- Interactive visualization
Module 26
- Cloudera spark Hadoop Developer certification
- Horton works certification
- Guidance and mock
Module 27
- Introduction to machine learning
- Applying machine learning algorithm in Hadoop and spark MLlib
- Classification and clustering
Module 28
- Case study 1: Sqoop, Hbase, Hive, spark , tableau
Module 29
- Case study 2: Kafka, spark streaming and HBase