The 19.05 EPM Cloud release brought new functionality into Data Management which provides the ability to set workflow modes. The documentation provides the following information on the workflow modes.
“By default, the data load process in Data Management is designed with a well-defined process flow that provides a full audit of the data load process and the ability to drill down and view data in Workbench. However, a full data flow may contribute to lengthy processing times due to archiving of data for audit purposes. Workflow mode options provide scalable solutions when processing large volumes of data, or when an audit is not required and performance is a key requirement.”
So basically, if you are not interested in viewing/downloading data from the workbench or drill through functionality and are looking for improved performance then maybe the workflow options can help.
I am not going to go into great depth, but behind the scenes the process flow of data in Data Management is all built around relational database tables and many SQL statements, no doubt you have noticed them in the process logs and increasing the log level will output in much more detail the SQL that is being executed at each step of the process. There are lots of different database tables involved which I am not going to bore you with, for this post all that you need to understand is that data flows through a staging table called TDATASEG_T where mappings are applied and ends up in a table called TDATASEG, the data that is viewable in the workbench is from the TDATASEG table. There is also a staging table for mappings called TDATAMAP_T which are archived in table TDATAMAPSEG.
The three available workflow mode options are:
The “Full” option is the default and it is the workflow mode you will be used to, there is no change to this and data flows in exactly the same way through the staging tables and can be viewed in the workbench. Drill through functionality is available if required.
If the logging level has been increased to 5, then in the process log you will see after the import and validate stage the data being moved from the staging table TDATASEG_T to TDATASEG.
DEBUG [AIF]: INSERT INTO TDATASEG (columns)
SELECT columns
FROM TDATASEG_T
WHERE LOADID = x AND PARTITIONKEY = x AND CATKEY = x AND PERIODKEY = 'YYYY-MM-DD'
The mappings that have been applied are also archived.
DEBUG [AIF]: INSERT INTO TDATAMAPSEG (columns)
SELECT columns
FROM TDATAMAP_T
WHERE LOADID = x
Before looking at the other workflow modes it is worth picking up on the point that the mode can only be defined at target application level. To minimise the impact to any existing integrations and to separate the different workflow modes it is possible to add a duplicate target application.
The same application can be selected, and a prefix applied.
A duplicate application will then be available.
So, let us have a look at the “Full No Archive” workflow mode.
This mode operates in the same way as the full mode until the export stage. If you run an import the same process will be followed, after validation the data can be viewed in the workbench.
At this point the data is still being held in the TDATASEG table. After an export has been successfully executed the data is no longer available in the workbench.
In the process log there are entries to show that data has been deleted from the TDATASEG table.
INFO [AIF]: Executing delete statement:
DEBUG [AIF]: DELETE FROM TDATASEG WHERE LOADID = x
INFO [AIF]: Successfully executed delete statement
DEBUG [AIF]: Number of Rows deleted from TDATASEG: x
DEBUG [AIF]: CommData.deleteData – END
Now this workflow mode is not going to offer any direct performance benefits, but it does keep the space used and size of the TDATASEG table down to a minimum as data is deleted at the export stage. This mode is not going to be of any use if you want to view the data or keep for audit purposes and you do lose out on drill through functionality.
Moving on to “Simple” workflow mode which is where the performance improvements should be seen.
The simple option does not move the data from the staging TDATASEG_T table to TDATASEG and after the mappings have been processed in the TDATAMAP_T staging table they are not archived in the TDATAMAPSEG table.
As the data does not exist in the TDATASEG table then after the import and validate stage completes the data cannot be viewed in the workbench. This also means any auditing or drill through capabilities have been lost.
There is also a slight difference as well in the export stage. With the default full workflow mode, data is extracted from the TDATASEG table using a database view.
DEBUG [AIF]: SELECT '"'||ACCOUNT||'"','"'||ENTITY||'"','"'||UD1||'"','"'||UD2||'"','"'||UD3||'"','"'||SCENARIO||'"','"'||YEAR||'"','"'||PERIOD||'"',AMOUNT
FROM AIF_HS_BALANCES
WHERE LOADID = x
As already stated the simple workflow mode does not load data into to the TDATASEG table and it persists in the TDATASEG_T staging table, this means the same view cannot be used.
From the logs it looks like another database view has been created which must extract data from the staging table.
DEBUG [AIF]: SELECT '"'||ACCOUNT||'"','"'||ENTITY||'"','"'||UD1||'"','"'||UD2||'"','"'||UD3||'"','"'||SCENARIO||'"','"'||YEAR||'"','"'||PERIOD||'"',AMOUNT
FROM AIF_HS_BALANCES_T
WHERE LOADID = x
Notice the database view name is suffixed with “_T” which indicates it will be looking at the staging table.
Please note: When using the simple workflow the full import and export has to be run in the same process. This means you should not use the workbench to load data when using the simple mode.
If you execute from "Data Load Rule" then the import and export will only be allowed together.
If you don't do this then the export to target will not contain any data.
So how about timings, is the simple mode much faster than full mode?
For a simple test I took an import file containing 1 million unique records of data. I ran the full import and export process for a single period for both full and simple modes. The time was taken from the start of the process up until the data was to be loaded to the target application.
Full mode = 10 minutes 52 seconds
Simple mode = 6 minutes 14 seconds
For a single period that is nearly a 43% improvement
I then ran the same test for 12 periods which means 12 million records were processed. I do have serious issues with the default way that multi periods are processed, instead of the data load being treated as one, each process is repeated for each period which is extremely inefficient. Hopefully one day this will be addressed.
The timings were:
Full mode = 2 hours 43 minutes
Simple Mode = 1 hour 11 minutes
From the timings you can see it certainly does make a big difference for multi periods. Though it does beg the question to why it took around 10 minutes to process a single period but 2 hour 43 minutes for 12. In theory it would be quicker to run each period individually.
I had to base the above timings up to the point where the data file is generated at the export stage, this is because the process failed with an Oracle database timeout error: “ORA-00040: active time limit exceeded - call aborted”. Unlike on-premise FDMEE it is not possible to change the data load options to a SQL based load which loads directly to the target application instead of creating a file first before loading.
To further improve performance, you could also look at using expressions which I covered in a previous blog.
“By default, the data load process in Data Management is designed with a well-defined process flow that provides a full audit of the data load process and the ability to drill down and view data in Workbench. However, a full data flow may contribute to lengthy processing times due to archiving of data for audit purposes. Workflow mode options provide scalable solutions when processing large volumes of data, or when an audit is not required and performance is a key requirement.”
So basically, if you are not interested in viewing/downloading data from the workbench or drill through functionality and are looking for improved performance then maybe the workflow options can help.
I am not going to go into great depth, but behind the scenes the process flow of data in Data Management is all built around relational database tables and many SQL statements, no doubt you have noticed them in the process logs and increasing the log level will output in much more detail the SQL that is being executed at each step of the process. There are lots of different database tables involved which I am not going to bore you with, for this post all that you need to understand is that data flows through a staging table called TDATASEG_T where mappings are applied and ends up in a table called TDATASEG, the data that is viewable in the workbench is from the TDATASEG table. There is also a staging table for mappings called TDATAMAP_T which are archived in table TDATAMAPSEG.
The three available workflow mode options are:
- Full
- Full (no archive)
- Simple
The “Full” option is the default and it is the workflow mode you will be used to, there is no change to this and data flows in exactly the same way through the staging tables and can be viewed in the workbench. Drill through functionality is available if required.
If the logging level has been increased to 5, then in the process log you will see after the import and validate stage the data being moved from the staging table TDATASEG_T to TDATASEG.
DEBUG [AIF]: INSERT INTO TDATASEG (columns)
SELECT columns
FROM TDATASEG_T
WHERE LOADID = x AND PARTITIONKEY = x AND CATKEY = x AND PERIODKEY = 'YYYY-MM-DD'
The mappings that have been applied are also archived.
DEBUG [AIF]: INSERT INTO TDATAMAPSEG (columns)
SELECT columns
FROM TDATAMAP_T
WHERE LOADID = x
Before looking at the other workflow modes it is worth picking up on the point that the mode can only be defined at target application level. To minimise the impact to any existing integrations and to separate the different workflow modes it is possible to add a duplicate target application.
The same application can be selected, and a prefix applied.
A duplicate application will then be available.
The workflow mode can be defined for the duplicate
application.
So, let us have a look at the “Full No Archive” workflow mode.
This mode operates in the same way as the full mode until the export stage. If you run an import the same process will be followed, after validation the data can be viewed in the workbench.
At this point the data is still being held in the TDATASEG table. After an export has been successfully executed the data is no longer available in the workbench.
In the process log there are entries to show that data has been deleted from the TDATASEG table.
INFO [AIF]: Executing delete statement:
DEBUG [AIF]: DELETE FROM TDATASEG WHERE LOADID = x
INFO [AIF]: Successfully executed delete statement
DEBUG [AIF]: Number of Rows deleted from TDATASEG: x
DEBUG [AIF]: CommData.deleteData – END
Now this workflow mode is not going to offer any direct performance benefits, but it does keep the space used and size of the TDATASEG table down to a minimum as data is deleted at the export stage. This mode is not going to be of any use if you want to view the data or keep for audit purposes and you do lose out on drill through functionality.
Moving on to “Simple” workflow mode which is where the performance improvements should be seen.
The simple option does not move the data from the staging TDATASEG_T table to TDATASEG and after the mappings have been processed in the TDATAMAP_T staging table they are not archived in the TDATAMAPSEG table.
As the data does not exist in the TDATASEG table then after the import and validate stage completes the data cannot be viewed in the workbench. This also means any auditing or drill through capabilities have been lost.
There is also a slight difference as well in the export stage. With the default full workflow mode, data is extracted from the TDATASEG table using a database view.
DEBUG [AIF]: SELECT '"'||ACCOUNT||'"','"'||ENTITY||'"','"'||UD1||'"','"'||UD2||'"','"'||UD3||'"','"'||SCENARIO||'"','"'||YEAR||'"','"'||PERIOD||'"',AMOUNT
FROM AIF_HS_BALANCES
WHERE LOADID = x
As already stated the simple workflow mode does not load data into to the TDATASEG table and it persists in the TDATASEG_T staging table, this means the same view cannot be used.
From the logs it looks like another database view has been created which must extract data from the staging table.
DEBUG [AIF]: SELECT '"'||ACCOUNT||'"','"'||ENTITY||'"','"'||UD1||'"','"'||UD2||'"','"'||UD3||'"','"'||SCENARIO||'"','"'||YEAR||'"','"'||PERIOD||'"',AMOUNT
FROM AIF_HS_BALANCES_T
WHERE LOADID = x
Notice the database view name is suffixed with “_T” which indicates it will be looking at the staging table.
Please note: When using the simple workflow the full import and export has to be run in the same process. This means you should not use the workbench to load data when using the simple mode.
If you execute from "Data Load Rule" then the import and export will only be allowed together.
So how about timings, is the simple mode much faster than full mode?
For a simple test I took an import file containing 1 million unique records of data. I ran the full import and export process for a single period for both full and simple modes. The time was taken from the start of the process up until the data was to be loaded to the target application.
Full mode = 10 minutes 52 seconds
Simple mode = 6 minutes 14 seconds
For a single period that is nearly a 43% improvement
I then ran the same test for 12 periods which means 12 million records were processed. I do have serious issues with the default way that multi periods are processed, instead of the data load being treated as one, each process is repeated for each period which is extremely inefficient. Hopefully one day this will be addressed.
The timings were:
Full mode = 2 hours 43 minutes
Simple Mode = 1 hour 11 minutes
From the timings you can see it certainly does make a big difference for multi periods. Though it does beg the question to why it took around 10 minutes to process a single period but 2 hour 43 minutes for 12. In theory it would be quicker to run each period individually.
I had to base the above timings up to the point where the data file is generated at the export stage, this is because the process failed with an Oracle database timeout error: “ORA-00040: active time limit exceeded - call aborted”. Unlike on-premise FDMEE it is not possible to change the data load options to a SQL based load which loads directly to the target application instead of creating a file first before loading.
To further improve performance, you could also look at using expressions which I covered in a previous blog.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.