ETL TOOLS Course Content
Training in Chennai
Overview of Extraction, Transformation, and Loading
You need to load your data warehouse regularly so that it can serve its purpose of facilitating business analysis. To do this, data from one or more operational systems needs to be extracted and copied into the warehouse. The process of extracting data from source systems and bringing it into the data warehouse is commonly called ETL, which stands for extraction, transformation, and loading. The acronym ETL is perhaps too simplistic, because it omits the transportation phase and implies that each of the other phases of the process is distinct. We refer to the entire process, including data loading, as ETL. You should understand that ETL refers to a broad process, and not three well-defined steps.
The methodology and tasks of ETL have been well known for many years, and are not necessarily unique to data warehouse environments: a wide variety of proprietary applications and database systems are the IT backbone of any enterprise. Data has to be shared between applications or systems, trying to integrate them, giving at least two applications the same picture of the world. This data sharing was mostly addressed by mechanisms similar to what we now call ETL.
Data warehouse environments face the same challenge with the additional burden that they not only have to exchange but to integrate, rearrange and consolidate data over many systems, thereby providing a new unified information base for business intelligence. Additionally, the data volume in data warehouse environments tends to be very large.
What happens during the ETL process? During extraction, the desired data is identified and extracted from many different sources, including database systems and applications. Very often, it is not possible to identify the specific subset of interest, therefore more data than necessary has to be extracted, so the identification of the relevant data will be done at a later point in time. Depending on the source system’s capabilities (for example, operating system resources), some transformations may take place during this extraction process. The size of the extracted data varies from hundreds of kilobytes up to gigabytes, depending on the source system and the business situation. The same is true for the time delta between two (logically) identical extractions: the time span may vary between days/hours and minutes to near real-time. Web server log files for example can easily become hundreds of megabytes in a very short period of time.
After extracting data, it has to be physically transported to the target system or an intermediate system for further processing. Depending on the chosen way of transportation, some transformations can be done during this process, too. For example, a SQL statement which directly accesses a remote target through a gateway can concatenate two columns as part of the SELECT
statement.
The emphasis in many of the examples in this section is scalability. Many long-time users of Oracle are experts in programming complex data transformation logic using PL/SQL. Hear we suggest alternatives for many such data manipulation operations, with a particular emphasis on implementations that take advantage of Oracle’s new SQL functionality, especially for ETL and the parallel query infrastructure.
Chapter 1 - Introduction and Architecture
- Introduction to Informatica.
- Architecture of Informatica PowerCenter.
- Installation and configuration.
- Informatica Design Process.
- Domain,node and services.
- Repository Service.
- Integration service.
- Configure security.
- Informatica Client Tool.
- Designer,Workflow manager ,Workflow monitor.
- Designer tools.
- Source analyzer,target designer,Transformation developer.
- Mapping and Mapplet designer.
Chapter 2 - Mapping Designer
- Create a Simple Mapping.
- Overview of Transformations.
- Types of Transformations.
- Active and Passive.
- Expression Transformation.
- Source Qualifier Transformation.
- Introduction to Workflow Manager.
- Task Developer.
- Workflow Designer.
- Creating Connections.
Chapter 3 - Flat Files Handling
- Create Simple Mappings.
- Flat File Analysis.
Chapter 4 - Architecture Review
- Architecture Review.
- Differences between PowerMart and PowerCenter.
- Connecting to SAP-FTP Process.
- Command Task.
- Create a Mapping.
- Joiner Transformation.
Chapter 5 - Transformations
- Create a Mapping.
- Using Join in a Source Qualifier Transformation.
- Expression Transformation.
- Filter Transformation.
Chapter 6 - Transformations Session: 1
- Multiple Targets Loading.
- Router Transformation.
- Difference Between Filter and Router Transformations.
- Create a Mapping.
- Master Outer Join.
Chapter 7 - Transformations Session: 2
- Reusable Transformation.
- Lookup Transformation.
- Debugger.
- Handling Multiple Files as Source.
Chapter 8 - Mapplets
- Sequence Generator Transformation.
- Multiple Pipelines in a Single Mapping.
- Target Load Plan-Mapplets.
Chapter 9 - Update Strategy
- SCD Type 1 mapping Implementation.
- Update Strategy Transformation.
Chapter 10 - SCD Type 2 and Type 3
- SCD Type 2 and Type 3 mapping Implementation
Chapter 11 - Loading Strategy
- Sequential and Parallel Loading.
- Union Transformation.
- Dynamic Lookup Cache.
- Introduction to Fact Loading.
Chapter 12 - Transformations
- Date Dimension Loading.
- Lookup Transformation.
- Persistent Cache.
- Aggregator Transformation.
- Sorter Transformation.
- Creating Stored Procedure.
Chapter 13 - Transformations
- Normalizer Transformation.
- Stored Procedure Transformation.
- Active Vs Passive Transformations.
- Connected Vs UnConnected Transformations.
Chapter 14 - Implementing Incremental Loading
- Importance of Mapping Variable.
Chapter 15 - Implementing Incremental Aggregation
- Create a Mapping.
- Incremental Aggregation.
Chapter 16 – Transaction and Variable Port
- Transaction Control Transformation.
- Variable Ports.
Chapter 17 – Workflow Tasks
- Task Link Properties.
- Event Raise Task.
- Event Wait Task.
- Worklet .
- SQL Concepts.
- Data warehousing Concepts.
- Data Stage Concepts.
- DS Components.
- Repository and Table definition.
- Built-in Components.
- Stage Editors.
- Types of Parallelism.
- Partition Techniques.
- Containers (Local & Shared).
- Managing Metadata Environment.
- Importing & Exporting Jobs.
- General, File, Processing, Database, Debug/Development.
- Restructure stages, Transformer, Lookup-Job Sequences.
- Slowly Changing Dimension (SCD).
- Performance and Tuning Aspect,FAQ.