1. What is Incremental data loading?
Incremental data loading is the process of loading the selective data which is either updated or created new from source system to the target system.
This is different from full data load where entire data is processed each load.
2. Benefits of Incremental data loading
- Working on selective data from source system reduces the overhead on ETL process, there by reduces the overall run time.
- Any ETL load process is prone to errors or failing because of multiple reasons. As only selective data is processed the probability of risk involved is reduced.
- The historical accuracy of the data is preserved. There by it is easy to track the data processed over a particular period of time.
3. Implementation of Incremental data loading in IICS
There are multiple ways of implementing Incremental data loading in Informatica Cloud and each method has its own advantages.
Before discussing each method in detail let us set up the source and target which will be used in the examples discussed below
3.1 Setting up Source and Target tables for the demonstration
Below is the source table EMP containing the employee information. The fields Created_date and Modified_date are defaulted to systimestamp meaning whenever a record is created the system timestamp gets inserted automatically for these fields.
CREATE TABLE EMP (
EMPLOYEE_ID NUMBER(6,0),
NAME VARCHAR2(20 BYTE),
SALARY NUMBER(8,2),
DEPARTMENT_ID NUMBER(4,0),
IS_ACTIVE VARCHAR2(1 BYTE),
CREATED_DATE TIMESTAMP DEFAULT SYSTIMESTAMP,
MODIFIED_DATE TIMESTAMP DEFAULT SYSTIMESTAMP
);
But we need the Modified_date to change whenever the record is modified. To achieve that lets create a simple trigger on that field.
CREATE OR REPLACE
TRIGGER HR.MODIFIED_DATE_TRIGGER
BEFORE INSERT OR UPDATE ON HR.EMP
FOR EACH ROW
BEGIN
:new.MODIFIED_DATE := SYSTIMESTAMP;
END;
After this whenever a record is created, systemtimestamp value gets loaded for both Created_date and Modified_date. But when something gets updated for the record, the created_date remains same but the Modified_date changes to current timestamp.
Let’s insert some sample data into the table.
INSERT INTO EMP(EMPLOYEE_ID,NAME,SALARY,DEPARTMENT_ID,IS_ACTIVE) VALUES(100,'Jennifer',4400,10,'Y');
INSERT INTO EMP(EMPLOYEE_ID,NAME,SALARY,DEPARTMENT_ID,IS_ACTIVE) VALUES(101,'Michael',13000,20,'Y');
INSERT INTO EMP(EMPLOYEE_ID,NAME,SALARY,DEPARTMENT_ID,IS_ACTIVE) VALUES(102,'Pat',6000,20,'Y');
INSERT INTO EMP(EMPLOYEE_ID,NAME,SALARY,DEPARTMENT_ID,IS_ACTIVE) VALUES(103,'Den', 11000,30,'Y');
INSERT INTO EMP(EMPLOYEE_ID,NAME,SALARY,DEPARTMENT_ID,IS_ACTIVE) VALUES(104,'Alexander',3100,30,'Y');
INSERT INTO EMP(EMPLOYEE_ID,NAME,SALARY,DEPARTMENT_ID,IS_ACTIVE) VALUES(105,'Shelli',2900,30,'Y');
INSERT INTO EMP(EMPLOYEE_ID,NAME,SALARY,DEPARTMENT_ID,IS_ACTIVE) VALUES(106,'Sigal',2800,30,'Y');
INSERT INTO EMP(EMPLOYEE_ID,NAME,SALARY,DEPARTMENT_ID,IS_ACTIVE) VALUES(107,'Guy',2600,30,'Y');
INSERT INTO EMP(EMPLOYEE_ID,NAME,SALARY,DEPARTMENT_ID,IS_ACTIVE) VALUES(108,'Karen',2500,30,'Y');
select * from EMP;
NOTE: The Created_Date and Modified_Date are auto populated in the EMP table.
Let us now create a target table to load the data and observe the incremental changes.
CREATE TABLE EMP_COPY(
EMPLOYEE_ID NUMBER(6,0),
NAME VARCHAR2(20 BYTE),
SALARY NUMBER(8,2),
DEPARTMENT_ID NUMBER(4,0),
IS_ACTIVE VARCHAR2(1 BYTE)
CREATED_DATE TIMESTAMP,
MODIFIED_DATE TIMESTAMP
);
NOTE: The Created_date and Modified_date are not assigned any default values like in source table.
3.2 Implementing Incremental Data Loading using IICS System Variables
IICS provides access to following system variables which can be used as a data filter variables to filter newly inserted or updated records.
$LastRunTime returns the last time when the task ran successfully.
$LastRunDate returns only the last date on which the task ran successfully.
For more information on these system variables, check out this Informatica article.
3.2.1 Mapping development steps
Select the table EMP as source and define the filter condition on the field Modified_Date as shown below
The system variable $LastRunTime is stored in GMT timezone.
The data in the oracle table EMP is stored in IST.
So if the data we are processing is not stored in GMT timezone it is required to add the conversion factor to the variable.
As IST is five and half hours ahead of GMT timezone, the oracle conversion of 5.5/24 is added.
For the initial run the value of $LastRunTime by default will be ‘1970-01-01 00:00:00’
In the target transformation select EMP_COPY as target as shown below and map the source fields under Field Mapping section and save the mapping.
3.2.2 Demonstration
Once the mapping is saved and triggered, all the 9 records from EMP will be loaded into the EMP_COPY table.
The query fired by Informatica in session log will be as below.
SELECT HR.EMP.EMPLOYEE_ID,
HR.EMP.NAME,
HR.EMP.SALARY,
HR.EMP.DEPARTMENT_ID,
HR.EMP.CREATED_DATE,
HR.EMP.MODIFIED_DATE
FROM HR.EMP
WHERE EMP.MODIFIED_DATE > to_timestamp('1970-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')+5.5/24
Now let’s update a record and see how the mapping behaves.
UPDATE EMP SET IS_ACTIVE = 'N' where EMPLOYEE_ID = '108';
We can now observe that the Modified_date is updated for ID 108 in EMP table
Let us trigger the mapping and see the query fired by Informatica and data processed.
Only one record is processed as expected.
The query in session log is as below.
SELECT HR.EMP.EMPLOYEE_ID,
HR.EMP.NAME,
HR.EMP.SALARY,
HR.EMP.DEPARTMENT_ID,
HR.EMP.CREATED_DATE,
HR.EMP.MODIFIED_DATE
FROM HR.EMP
WHERE EMP.MODIFIED_DATE > to_timestamp('2020-10-23 12:50:17','YYYY-MM-DD HH24:MI:SS')+5.5/24
The time conversion of to_timestamp(‘2020-10-23 12:50:17′,’YYYY-MM-DD HH24:MI:SS’)+5.5/24 translates to 2020-10-23 17:50:17. This is the exact time we triggered the job first time when 9 records were processed.
It is important to understand that $LastRunTime stores the task last run time. Not the maximum last modified date value from the table.
Hence the timezone conversion is mandatory.
3.3 Implementing Incremental Data Loading using IICS Input-Output Parameters
3.3.1 Mapping development steps
Create a new mapping and from the Parameter panel in the mapping create an Input-Output Parameter.
Enter the name of the parameter, data type as string of size 40 and the default value as shown below. Leave the rest of the properties as it is and click OK.
Under Source select EMP as source object and define the filter as shown below.
Filter: EMP.MODIFIED_DATE>TO_TIMESTAMP(‘$$MaxDate’,’YYYY-MM-DD HH24:MI.SSXFF’)
Map the source fields to an Expression transformation. In Expression transformation create a new field and assign the value as below.
The target configuration is same as we discussed in previous method. Select EMP_COPY as target and do the required Field Mapping.
Create another Flat File target and just map OutMaxDate field into the target from Expression and save the mapping.
Now let us understand step by step what we have done here
- We have created an Input-output Parameter which is same as a variable in Informatica Powercenter of type string and we have defined a default value.
- In the source we have defined a filter on Modified_date field based on the variable we created. As the variable is of type string we are converting it into datetime format using TO_TIMESTAMP function.
- For the initial run the mapping runs with default value we defined for MaxDate variable which is 1970-01-01 00:00:00.000
- In the next step we are calculating the maximum value of the Modified_date field out of all records using SETMAXVARIABLE function and assigning it to the MaxDate variable.
- As the MaxDate variable is of type string we are converting Modified_date into string using TO_CHAR function.
- So by the end of the mapping the variable we created will be assigned with the maximum Modified_Date value out of all the records which will be used in the source query of next run.
- We have created couple of targets. One our primary target to load the data and other one just to see how the MaxDate value gets changed while processing each record.
The final mapping will be as shown below
0 Comments