Wednesday, 22 February 2012

ODI Series – ASO parallel data loads

I have been meaning to write up this blog for a long time but never got round to it and a recent post on the essbase Oracle forum prompted me to revisit this topic.

As you may be aware there are a couple of methods available to parallel load data into an ASO database,  one method is to have multiple sessions of Maxl each performing a data load or the much more efficient way of using one Maxl statement using multiple load rules (up to eight).

The Maxl syntax is
Import database <appname>.<dbname> data connect as <SQLUSER> identified by <SQLPASSWORD> using multiple rules_file ‘<RULENAME>’,’<RULENAME>’.. to load_buffer_block starting with buffer_id <ID_NUMBER> on error write to …

A buffer is created for each load rule and the id starts with the number defined in the statement, so if you have two load rules starting at buffer id 100 then once the statement is executed buffers 100 and 101 will be created and data from the first load rule will be loaded to buffer 100 and the second to 101, once both data loads are complete then the content of the buffers will be committed to the database in one operation.

For an example I am going to load data to the ASOsamp.Sample database using two load rules with the parallel load Maxl and then go through the process to achieve the same using ODI and the essbase adaptor.


I created two SQL load rules DLSQL2,DLSQL3 which were pretty much identical in format.

import database AsoSamp.Sample data connect as ODISTAGE identified by 'password' using multiple rules_file 'DLSQL2','DLSQL3' to load_buffer_block starting with buffer_id 100 on error write to "F:\Scripts\dloaderror.txt";



A simple Maxl statement was created to use the parallel data load functionality starting with buffer id 100 and using the two rules files I had created.


If you query the load buffers while the data loads are taking place you can see that two buffers (100,101) have been created.

The default aggregation method is AGGREGATE_SUM which means “Add values when the buffer contains multiple values for the same cell.”

By default missing values are ignored but zero values are not.

Later on I will compare how ODI manages the buffers.

If you check the essbase application log when the parallel loads are being run you will see that the load buffers are initialised


The data loads then take place


And finally once the data loads are complete the buffers are committed in one operation.


So a nice and simple example which I will now try and replicate the functionality using ODI.

I am going to be using ODI 11g as load plan are available which make it much simpler to execute scenarios in parallel, it is certainly possible to achieve it in 10g using a package and the scenarios set to asynchronous mode.


The essbase ODI model reversing options was set to multiple data columns with measures being the data column and members “Original Price”, “Price Paid”, “Returns”, “Units”, “Transactions”, this will match the data load in the load rules which were originally set up for the Maxl method.


I created two similar interfaces with the source as a relational table and the target being the essbase database datastore, I had to use the SUM function on the members due to an issue I encountered when loading same cell records with multiple values, I will cover this issue towards the end of this blog.


In the IKM options you will notice there are three options relating to ASO, these options just appeared in one of the patch releases of 10g though I am not sure exactly which version that was, if you don’t have the options then you really should be patching ODI because many issues with the Hyperion adaptors have been addressed over time.

BUFFER_ID – This is exactly the same as the buffer id concept used with Maxl

BUFFER_SIZE – When performing an incremental data load, Essbase uses the aggregate storage cache for sorting data. You can control how much of the cache a data load buffer can use by specifying the percentage (between 0 and 100% inclusive). By default, the resource usage of a data load buffer is set to 100, and the total resource usage of all data load buffers created on a database cannot exceed 100. For example, if a buffer of 90 exists, you cannot create another buffer of a size greater than 10. A value of 0 indicates to Essbase to use a self-determined, default load buffer size

This is the same as resource usage in the essbase world, it differs slightly than that when used in Maxl it is a percentage between 0.1 and 1 and in ODI it is between 0 and 100.

GROUP_ID – This option is used when using parallel data loads, each interface that is going to be run in parallel will require the same group id. I am not sure exactly how internally it works but it looks like in the core Java code for the adaptor there is a buffer id manager and when interfaces are executed the buffer id is added to the manager, once all executing sessions are complete then the array of IDs in the buffer manager are committed at once.

Second interface KM options -

Each of the interfaces have a load rule defined and I set both interfaces to a group ID of 1, the first interface had a buffer id of 1 and the second a buffer id of 2, as there are two loads in parallel the buffer size in each of the interfaces was set to 50%.


A scenario was generated for each interface this is because when using load plans it is not possible to directly add an interface and a scenario is required.


A simple load plan was created and the two scenarios generated from the interfaces were added and set to execute in parallel.


After executing the Load Plan you can see in the operator that both the scenarios are executed in parallel.


Looking at the essbase application logs then it seems to follow the same concept as with the Maxl method, now it is not going to be exactly the same as the Maxl method uses SQL load rules and ODI uses the Java API and streams the data via the load rule.

SQL load rule using Maxl -

ODI Data load -

If you query the load buffers when using ODI to load data then there is a difference in some of the default properties being used.


The aggregation method being used is “AGGREGRATE_ASSUME_EQ” compared the Maxl default of “AGGREGATE_SUM”, I am not aware of the property that ODI is using and I am not sure why it doesn’t use the aggregate sum method.

Ignore missing are also set to false compared to the default of True when using Maxl, setting missing to True is the optimal performance.

The problem here is that with ODI you have no option to change these default values and are stuck with what has been hardcoded in the Java code, this is pretty poor in my opinion and I wish Oracle would wake up a little and start enhancing the adaptors and take data integration with EPM seriously before they are left behind, if they were enhanced then I am sure more would use them instead of being forced down firing off Maxl from within ODI.

I had a hunt around in the Java code to see how the buffers are initialised.


They use the loadBufferInit method which is available the essbase Java API.


The aggregation method in the adaptor code is being fixed to duplicates assume equal.

The options ignore missing and zero value is also hardcoded.

The options when the buffer is committed are also fixed so you lose the options to work with slices and overriding data.

In my opinion it really wouldn’t be touch much of a development exercise to make these options available to be set in the IKM like with buffer id and size.

Anyway this leads me on to the issue I experienced when loading multiple values to the same cell which may be down to the fixing of the aggregation method as shown above.


The above source data has multiple values being loaded against the same cell and if the data is loaded using Maxl then it loads without any issues.


com.hyperion.odi.essbase.ODIEssbaseException: Cannot Load buffer term. Essbase Error(1270089): Data load failed: input contains different values for the same cell [(Original Price, Curr Year, Jan, Sale, Credit Card, No Promotion, 26 to 30 Years, 20,000-29,999, Photo Printers, 017589, 13835): 240 / 236]

If you load the same data using ODI then the interface fails with an error about the input data containing different values for the same cell, this may be because of the aggregation property being used when the essbase adaptor initialises the buffer.

The workaround is to sum the data in the interface but I still consider this to be a bug with the knowledge module because you have no way of changing any of the default values, over to you Oracle.

1 comment:

  1. Hi John,

    I did the same thing in ODI 11g except that I have 4 rule files which and I am loading data from a single table to Essbase. I am getting error: specified load buffer usage is more than 0.5 of available 0.

    In all of my 4 interfaces I have group id 1. buffer id:1,2,3,4 and buffer: 25 each, and 4 different rule files.

    When I am loading 2 interfaces parallel it is loading but 4 parallel load fails.

    ReplyDelete

Note: only a member of this blog may post a comment.