Saturday 8 September 2018

EPM Cloud Data Management – Incremental File Adapter

New functionality has been added to Data Management in the EPM Cloud 18.09 release, and in this post I am going to cover the incremental file adapter.

The monthly cloud release documentation contains the following summary:

“Data Management now includes an incremental file adapter to optimize the data load process. The new Incremental adapter compares the source data file with a prior version of the source data file and identifies new or changed records and loads only this data set. The user has the option to sort the input source data file before making the comparison or they can provide a pre-sorted file for better performance.”

So basically, if you are loading files which contains a full data set then before this release you would need to replace the existing the existing data in Data Management and reload, which, depending on the size on the source file, can cause performance issues. With the new incremental file adapter, the previous file will be compared with the latest file and only the differences will be loaded, so in theory performance should be much improved.

To get a good understanding of the adapter it is worth going through a simple example.

There are some important points that should be considered when working with the adapter:
  • The source data file must be a delimited data file.
  • Data files used must contain a one-line header, which describes the delimited columns.
  • Both numeric and non-numeric data can be loaded.
  • Any deleted records between the two files is ignored. In this case, you have to handle the deleted records manually.
  • If the file is missing (or you change the last ID to a non-existent run), the load completes with an error.
  • Sort options determine the level of performance using this feature. Sorting increases the processing time. Pre-sorting the file makes the process faster.
  • Only single period data loads are supported for an incremental load. Multi-period loads are not supported.
  • Drill down is not supported for incremental loads since incremental files are loaded in Replace mode and only the last version of the file comparison is present in the staging table.
  • Copies of the source data file are archived for future comparison. Only the last 5 versions are retained. Files are retained for a maximum of 60 days. If no incremental load is performed for more than 60 days, then set the Last Process ID to 0 and perform the load.
Some of the above points will become clearer with the following example, so let’s get started.

The file I am going to begin with only contains a few records as this is just meant to be a simple example.


The file satisfies the first two points, it contains a header record which describes the columns, and it is delimited.

I first thought that the incremental file adapter would be configured from the source system area in Data Management, but this is not the case; it is created as a target application.

You need to go to target application and add a new data source.


Once “Data Source” has been selected, a new window will open where you can select “Incremental File” as the source system.


This will open the file select window where you can select an existing file or upload one.


The filename will be the basis for the name of the target application


If you intend to use multiple files with the same naming convention, then you should use the prefix option to make the target applications unique.

Selecting the file at this point is only to define the dimension details in the application details.


The dimension names are automatically created from the header record in the source file.

The steps to create an integration are the same as you would normally go through, so I am only going to highlight some of them.

In the import format creation, the newly created application should be available as a source.


The source columns of the file are then mapped to the target application.


The location set up is no different.


On to the data load rule which is slightly different than a standard file-based load, the file is selected in the source filters section.


There are two other properties in the source filters, the incremental processing options has the following possible values:

Do not sort source file - Source file is compared as provided. This option assumes that the source file is generated in the same order each time. In this case, the system performs a file comparison, and then extracts the new and changed records. This option makes the incremental file load perform faster.

Sort source file—Source file is sorted before performing the file comparison for changes. In this option the source file is first sorted. The sorted file is then compared to the prior sorted version of this file. Sorting a large file consumes a lot system resources and performs slower.

For this example, I am going to go with not sorting the source file, not that it would make much difference in my case because the file is so small. Once I get the opportunity I would like to test the performance of having to sort a large file.

Next is the last process ID, the first time you load a file this would be set to 0, when the load is run again the ID will be updated to match the process ID.

If there are no differences between the latest file and the previous file or the source file doesn’t exist, the ID will be kept as the last successful load ID.

To reload all the data the process ID should be set back to 0 and the file defined in the source filter will be considered the baseline file.

If you are going to change the sorting option then to stop data issues you will need to reset the process ID back to 0 and reload.


After selecting the source file, some simple pass through mappings are added, which means the file can be loaded.


The data load acts like any other, it is only when you take a look at the log you get an idea of what is going on.

INFO  [AIF]: Executing the following script: /u02/Oracle/Middleware/EPMSystem11R1/products/FinancialDataQuality/bin/plugin/IncrementalFile.py
INFO  [AIF]: ************************************************************************************
INFO  [AIF]: *      IncrementalFile.py Started for LoadID: 95
INFO  [AIF]: ************************************************************************************
DEBUG [AIF]: loadID: 95
DEBUG [AIF]: ruleID: 14
DEBUG [AIF]: LAST_PROCESS_ID: 0
DEBUG [AIF]: INCREMENTAL_PROCESSING_OPTION: NO_SORT_NO_MISSING
DEBUG [AIF]: Source file: /u03/inbox/inbox/incremental.csv
DEBUG [AIF]: Data file: /u03/inbox/data/appdata/sourcefiles/14/Vision_SOURCE_95.dat
DEBUG [AIF]: Deleting old source files from /u03/inbox/data/appdata/sourcefiles/14
DEBUG [AIF]: impGroupKey: MT_Incremental_IF
DEBUG [AIF]: Import format delimiter = ,
DEBUG [AIF]: Instantiated FileDiffUtils.
DEBUG [AIF]: Full load.  No incremental file.
DEBUG [AIF]: dataFileName: Vision_SOURCE_95.dat

It looks like it is a python/jython script that is handling the incremental files and differences, the file is moved into a source file directory which is numbered by the load rule ID, in this case 14 and the file format is <TARGET_APP>_SOURCE_<LOADID>.dat

As the last process ID is 0 then it is considered a full load and no incremental differences are required and the file is loaded.

If you go back into the load rule you can see the last process ID has been updated to reflect the load ID of the last import.


Now, time to add additional records to the same file. I selected not to sort the source file in the load rule, so I made sure the records were sorted.


Run the process again and this time in the logs you can see that a difference file has been created and loaded, the format of the difference file is <TARGET_APP>_DIFF_<LOADID>.dat

DEBUG [AIF]: Source file: /u03/inbox/inbox/incremental.csv
DEBUG [AIF]: Data file: /u03/inbox/data/appdata/sourcefiles/14/Vision_SOURCE_96.dat
DEBUG [AIF]: Deleting old source files from /u03/inbox/data/appdata/sourcefiles/14
DEBUG [AIF]: impGroupKey: MT_Incremental_IF
DEBUG [AIF]: Import format delimiter = ,
DEBUG [AIF]: Instantiated FileDiffUtils.
DEBUG [AIF]: Diff file: /u03/inbox/data/Vision_DIFF_96.dat
DEBUG [AIF]: Header line: Account,Entity,Product,Amount
DEBUG [AIF]: dataFileName: Vision_DIFF_96.dat

Back in the load rule the last process ID has been updated.


In the workbench I spotted an issue.


There was one record that was from the previous load, in theory this should not have been loaded again.

I was thinking maybe it was the way I sorted the file, so I decided to run the process again but this time set the processing option to sort the file.


I reset the last process ID back to zero and uploaded my original file and ran the process again.


In the logs you can see entries to show that the source file is being sorted.

DEBUG [AIF]: Sorting data file started.
DEBUG [AIF]: Sorting data file completed.

Uploaded the incremental file and ran the process.


In the logs you can see the source file is being sorted and a difference file created.

DEBUG [AIF]: Sorting data file started.
DEBUG [AIF]: Sorting data file completed.
DEBUG [AIF]: Header line: Account,Entity,Product,Amount
DEBUG [AIF]: dataFileName: Vision_DIFF_98.dat

I was hoping that the issue would be fixed and the record from the full data load would not be included in the incremental load.


Unfortunately, it is still loading the record, so unless I am missing something there is a problem with the script that is working out the differences.

I thought I would have a look at the files that have been generated by Data Management, the files can be downloaded using EPM Automate or REST.


The files were sorted in exactly the same way as in my original testing so at least I know I got that correct, though the difference file is not correct as it contains a record from the first full load.


I found out if I created the incremental file by appending the records to the end of the original file.


Then set the processing option not to sort the source file.


Reset the last process ID back to zero, reloaded both files, this time the incremental data was correct and did not contain the record from the original load.


Anyway, moving on, if I try to load the file again with no changes then the difference file will contain no records.


The process will be displayed as a warning.


No records will exist in the workbench.


Going back to one of the earlier points:

“Only the last 5 versions are retained. Files are retained for a maximum of 60 days.”

You will see an entry in the log when go over 5 versions.

DEBUG [AIF]: Deleting old source files from /u03/inbox/data/appdata/sourcefiles/14
DEBUG [AIF]: Deleted file: /u03/inbox/data/appdata/sourcefiles/14/Vision_SOURCE_95.dat

For each load rule there will be a maximum of 5 archived source files on the file system.


It is still possible to run the data load rule with EPM Automate or REST and pass in the filename.

First the file can be uploaded.


Next the data load rule can be executed by passing in the filename.


In the process logs you will see an entry for EPM Automate and the filename.

DEBUG [AIF]: epmAutomateFileName: sep-18-actuals.csv
DEBUG [AIF]: Source file: /u03/inbox/inbox/sep-18-actuals.csv

I did begin to have a look at loading larger files to check out the performance when sorting is enabled, but I seem to hit more problems.

I started with a file like in my previous example but this time containing 50,000 records. I then loaded an incremental file containing 100,000 records which included the 50,000 from the original file.

For some reason 199,000 records were loaded to Data Management.

In the log I could see:

DEBUG [AIF]: dataFileName: Vision_DIFF_116.dat

HEADER_ROW] Account,Entity,Product,Amount
INFO  [AIF]: EPMFDM-140274:Message - [TC] - [Amount=NN] 99990,4000,1200,9299Account,Entity,Product,Amount
INFO  [AIF]: EPMFDM-140274:Message - Rows Loaded: 199999
Rows Rejected: 1

I downloaded the file containing the differences and the file contained 200,001 records.


It seems to have created the difference file by joining the incremental file to itself.


I am concerned about some of these issues I have seen, it may just be me but if you are going to use the adapter functionality then please do carry out testing first.

I am going to leave it there for this introduction to the incremental file adapter, if I find any more details about the problems I have experienced then I will update this post.


2 comments:

  1. Any idea if they have addressed any of the bugs you found. They seem to be quite fundamental issues with the fuctionality

    ReplyDelete
  2. Not sure, I have not tested the functionality since it was released.

    ReplyDelete

Note: only a member of this blog may post a comment.