Informatica BDM – Geoinsyssoft

Syllabus

Module 1: Big Data Integration Course Introduction

Course Agenda
Accessing the lab environment
Related Courses

Module 2: Big Data Basics

What is Big Data?
Hadoop concepts
Hadoop Architecture Components
The Hadoop Distributed File System (HDFS)
Purposes of a Name Node & Secondary Name Node
MapReduce
“Yet Another Resource Manager” (YARN) (MapReduce Version 2)

Module 3: Data Warehouse Offloading

Challenges with traditional Data Warehousing
The requirements of optimal Data Warehouse
The Data Warehouse Offloading Process

Module 4: Ingestion and Offload

PowerCenter Reuse Reports
Importin PowerCenter Mappings to Developer
SQOOP
SQL to Mapping capability
Partitioning and parallelism

Module 5: Big Data Management Architecture

The Big Data world
Build once, deploy anywhere
The Informatica abstraction layer
Polyglot computing
The Smart Executor
Open source and innovation
Connection architecture
Conections to third Party applications

Module 6: Informatica Polyglot Computing in Hadoop

Hive MR/Tez
Blaze
Spark
Native
The Smart Executor

Module 7: Mappings, Monitoring, and Troubleshooting

Configuring and running a mapping in Native and Hadoop environments
Execution Plans
Monitor mappings
Troubleshoot mappings
Viewing mapping results

Module 8: Hadoop Data Integration Challenges and Performance Tuning

Describe challenges with executing mappings in Hadoop
Big Data Management Performance Tuning
Hive Environment Optimization
Tips

Module 9: Data Quality on Hadoop

The Data Quality process
Discover insights into your data
Collaborate and Create Data Improvement Assets
Modify, Manage, and Monitor Data Quality
Self Service Data Quality
Executing Data Quality mappings on Hadoop

Module 10: Complex File Parsing

The Complex file reader
The Data Processor transformation
The Complex file writer
Performance Considerations: Partitioning
Parsing and processing Avro, Parquet, JSON, and XML file
Data Processor Transformation Considerations

Module 11: Accessing NoSQL Databases

CAP Theorem
HBase
MongoDB
Cassandra