Hadoop Administrator Training
Training in Chennai
About Big Data
Big data is a broad term for data sets so large or complex that traditional dataprocessing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. Apache Hadoop is 100% open source, and pioneered a fundamentally new way of storing and processing data. Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big.
Introduction to Hadoop Ecosystem – Architecture – HDFS -Mapreduce (MRV1)-Hadoop v1 and v2-Hadoop Data fedaration-VM Linux ubuntu/CentOS-JDK,ssh,eclipse-Installation and config of Hadoop,-HDFS Daemons-YARN Daemons-High Availability-Automatica and manual failover-Writing Data to HDFS-Reading Data from DFS-Replica placement Strategy-Failure Handling-Namenode ,Datanode ,Block-Safe mode ,Re-balancing and load optimization-Trouble shooting and error rectification-Hadoop fs shell commands-Unix and Java Basics.
Module 1
- What is Big Data ?
- Big Data Facts
- The Three V’s of Big Data
- Understanding Hadoop
- What is Hadoop ?,Why learn Hadoop ?
- Relational Databases Vs. Hadoop
- Motivation for Hadoop
- 6 Key Hadoop Data Types
- The Hadoop Distributed File system (HDFS)
Module 2
- What is HDFS ?
- HDFS components
- Understanding Block storage
- The Name Node
- The Data Nodes
- Data Node Failures
- HDFS Commands
- HDFS File Permissions
- The MapReduce Framework
Module 3
- Overview of MapReduce
- Understanding MapReduce
- The Map Phase
- The Reduce Phase
- WordCount in MapReduce
- Running MapReduce Job
- Planning Your Hadoop Cluster
Module 4
- Single Node Cluster Configuration
- Multi-Node Cluster Configuration
- Checking HDFS Status
- Breaking the cluster
- Copying Data Between Clusters
- Adding and Removing Cluster Nodes
- Rebalancing the cluster
- Name Node Metadata Backup
- Cluster Upgrading
Module 5
- Installing and Managing Hadoop Ecosystem Projects
- Sqoop
- Flume
- Hive
- Pig
- HBase
- Oozie
Module 6
- Managing and Scheduling Jobs
- Managing Jobs
- The FIFO Scheduler
- The Fair Schedule
- How to stop and start jobs running on the cluster
- Cluster Monitoring, Troubleshooting, and Optimizing
Module 7
- General System conditions to Monitor
- Name Node and Job Tracker Web Uis
- View and Manage Hadoop’s Log files
- Ganglia Monitoring Tool
- Common cluster issues and their resolutions
- Benchmark your cluster’s performance
- Populating HDFS from External Sources
- How to use Sqoop to import data from RDBMSs to HDFS
- How to gather logs from multiple systems using Flume
- Features of Hive, Hbase and Pig
- How to populate HDFS from external Sources