Syllabus
Module 1: Big Data Integration Course Introduction
Course Agenda
Accessing the lab environment
Related Courses
Module 2: Big Data Basics
What is Big Data?
Hadoop concepts
Hadoop Architecture Components
The Hadoop Distributed File System (HDFS)
Purposes of a Name Node & Secondary Name Node
MapReduce
“Yet Another Resource Manager” (YARN) (MapReduce Version 2)
Module 3: Data Warehouse Offloading
Challenges with traditional Data Warehousing
The requirements of optimal Data Warehouse
The Data Warehouse Offloading Process
Module 4: Ingestion and Offload
PowerCenter Reuse Reports
Importin PowerCenter Mappings to Developer
SQOOP
SQL to Mapping capability
Partitioning and parallelism
Module 5: Big Data Management Architecture
The Big Data world
Build once, deploy anywhere
The Informatica abstraction layer
Polyglot computing
The Smart Executor
Open source and innovation
Connection architecture
Conections to third Party applications
Module 6: Informatica Polyglot Computing in Hadoop
Hive MR/Tez
Blaze
Spark
Native
The Smart Executor
Module 7: Mappings, Monitoring, and Troubleshooting
Configuring and running a mapping in Native and Hadoop environments
Execution Plans
Monitor mappings
Troubleshoot mappings
Viewing mapping results
Module 8: Hadoop Data Integration Challenges and Performance Tuning
Describe challenges with executing mappings in Hadoop
Big Data Management Performance Tuning
Hive Environment Optimization
Tips
Module 9: Data Quality on Hadoop
The Data Quality process
Discover insights into your data
Collaborate and Create Data Improvement Assets
Modify, Manage, and Monitor Data Quality
Self Service Data Quality
Executing Data Quality mappings on Hadoop
Module 10: Complex File Parsing
The Complex file reader
The Data Processor transformation
The Complex file writer
Performance Considerations: Partitioning
Parsing and processing Avro, Parquet, JSON, and XML file
Data Processor Transformation Considerations
Module 11: Accessing NoSQL Databases
CAP Theorem
HBase
MongoDB
Cassandra