Sunday, 29 March 2009

ODI Series - Reversing a collection of files from an excel template

If you have a collection of flat files that need reversing or change occasionally then it can be time consuming to reverse them one by one and set up their parameters, currently there is no way of automatically reversing files from a model.

There is a Knowledge module “RKM File (FROM Excel” available to help manage flat files, this RKM provides the ability to reverse files and maintain the structure definition based on the information held in an excel template.

There is an example of the excel template at :- \OraHome_1\oracledi\demo\excel\file_repository.xls
The template consists two worksheets, the first holding the file information and relates to the file tab in a file datastore

The second holds the column information for the files and relates to the columns tab in a file datastore.

If you highlight the column heading a brief description is given of what the column is used for, most of them are self-explanatory anyway. Most of the information can be cross-referenced with a datastore for example finding the code for field seperator.

Please note though if you are going to use the templates do not change the structure of the file as the RKM will fail or not produce the desired results.

Ok, say I have the following files I want to maintain their structure in the template.

The first step is to fill in the relevant information into the files sheet of the template, most of the parameters will be the same so it is easiest just to copy and paste rows and change file names, table name and alias.

Now the columns sheet needs to be populated with the structure of the columns in the files.

Now the template is complete an ODBC connection is required so the ODI excel technology can access it.

Be careful when picking the excel driver, I tried to use the latest driver which comes with excel 2007 and had problems with it, reverting to an older driver fixed the issue.

In the topology manager a new data server is required using the JDBC driver sun.jdbc.odbc.JdbcOdbcDriver and the JDBC URL pointing to the ODBC data source name just created e.g. jdbc:odbc:FILETEMPLATE

In the physical schema it is easiest to give the Logical Schema the name of “EXCEL_FILE_REPOSITORY” as this the name the RKM will be looking for, it is possible to create a different name though the RKM will need to be updated.

Before moving on to the next stage be sure to have created all the necessary components pointing to the location of the flat files in the topology manager (I have covered how to do this in an early blog)

Moving on to the designer, if you don’t have the “RKM File (FROM EXCEL)” imported then this is a good time to import it.
A new model can be created using the File technology and the associated logical schema.

In the Reverse tab select customized and the RKM should be automatically selected.

Apply and reverse to build the file datastores from the template, make sure the excel template is closed before running the reverse otherwise it will fail, after reversing it is good practice to check in the operator that everything was successful.

So there we have it an easy way of managing the structure of files from an excel template.

Sunday, 22 March 2009

ODI Series - Excel functionality

I was going to start taking a look at using the web services functionality in ODI to invoke a web service, one of the products within the EPM area that using web services is EPMA so my idea was to look at integrating the two components. Unfortunately during my testing I ran into a few issues and surprisingly they were with ODI and not EPMA, I spent a while trying to resolve the issue but with no success.

I have since logged the problem with Oracle and they have confirmed it as a bug, I am not sure of the timeframe to fix the bug and there is no workaround so I am going to have to wait on that one.

So today I am going to look at a few areas where you can use excel with ODI. The first method I will be looking at is extracting data from excel and loading into a database table, now the database table could easily be replaced with planning or essbase. I first would like to stress that extracting information directly from excel is not the most efficient method and can be resource hungry but it is just to give you an idea of how it can be done. The best method would be to try to manipulate the excel file using VBA and turn it into a csv format which could then be loaded in faster using ODI.

Anyway lets consider that there is no option to change the excel file and the requirement is to extract the data from it.

Apologises for the style of the report I didn’t want to spend too long creating a masterpiece.

So the object of this exercise is to get the information from the excel file into the above database table. To achieve this the structure of the excel file has to be reversed in the ODI datastore format but before this is done named ranges have to be created in the file, this is because the ODI excel technology works on the basis of turning named ranges into datastores.

The named range “Colas” takes care of the product and period fields.

The named range “MrktScenMeas” relates to the fields Market,Scenario and Measures, the top row is hidden in the excel file but is required in the named range so it can be converted into column names in the ODI datastore. A final named range “RootBeers” was added that highlighted the second table of data in the excel file.

Next step is to create an ODBC connection to the excel file as this is the technology that ODI uses to integrate with excel.

Now to set up the connection details in the Topology manager, first insert a data server using “Microsoft Excel” technology.

The user/password section is left blank.
In the JDBC information the driver to be used is :- sun.jdbc.odbc.JdbcOdbcDriver
and the format for the JDBC url is :- jdbc:odbc: <odbc_dsn_alias>

Replacing <odbc_dsn_alias> for the name which was given in the previous step so in my case it will be :- jdbc:odbc:ODI_EXCEL_REPORT

Once this is applied the physical schema window is opened, the only information you have to supply is the logical schema name.

On to the designer where the reversing can being, first a new model is created, the technology of “Microsoft Excel” and logical schema is selected.

In the reverse section just the context is chosen and Standard is left as default for the type of reverse, in the “Selective Reverse” section the named ranges that were created will be displayed and you then have the option to pick and choose the ones you want to reverse into datastores.

Once reversed the datastores will be created with the column names to match the excel file.

If you right click the datastore you should be able to view the data from the excel file

Now that the models are complete it is time to build an interface to load the data from excel straight into the database table. I am not going into reversing the database table as I have covered that off in past blogs.

I am using the target database repository, as the staging area so there is no need to select the staging area different from target option.
In the diagram section the db table datastore are dragged as the target datastore and the two excel datastores dragged onto the source area.

The problem at the moment is there is no join between the two source datastores and there is no common column to create a join.

To get around this you can drag one of the columns on the source datastores on to the other source datastore to create a join, select the staging area to execute on and click the Cross option, this creates a cross join or otherwise known as a Cartesian join which combines each row of the two datastores.

In the flow area the LKM should be automatically selected as LKM SQL to SQL and the IKM I chose was IKM SQL to Control Append, the FLOW_CONTROL option has to be set to No as there are no key fields in the integration.

To load the other table of data in the excel file then the interface would be exactly same with just the datastore with the product/periods changing, the two interfaces can be then put into a package to seamlessly load the data. Make sure you don’t have the excel file open when you run the interfaces otherwise it will fail.

Next time I will look at automatically generating file datastores from information held in an excel template using the RKM File (FROM EXCEL)

Sunday, 8 March 2009

ODI – Hyperion Knowledge Modules - Reporting statistics correctly

If you have ever took looked at the steps in the Hyperion KM’s you will notice each one of them has one called “Report Statistics”

The step is there to record the details of the number of rows processed or the number of rows rejected, after running an integration using one of the KMs and looking in the operator you will see the step producing a warning

The step is set to “Ignore Errors” so it never fails on this step but why is it set up like this?

If you look at any of the Hyperion KMs and the step before the “Report Statistics” you will see that the results are returned into an object

This information is generated from a Java class called Statistics; the class has two variables known as successRows and errorRows and these variables are populated during the execution of the Knowledge Module.

Now if you look at the code “Report Statistics” step.

The step is executing a method toString() in the Statistics class, this method takes the two variables (successRows and errorRows) and formats it into it into an output message.

Number of rows successfully processed: <successRows>
Number of rows rejected: <errorRows>

Now in earlier versions of ODI there wasn’t really anything you could do with these results so the only option was to raise an error so that the information is displayed and the line raise “Planning Writer Load Summary”, statString is doing this.

If you have a look in the operator for any of the Hyperion KM report statistics steps you can see it in action.

This is the reason why the step has “Ignore Errors” selected so the integration doesn’t fail here.

From ODI version there were some new substitution methods introduced; substitution methods are basically Java methods that are accessible from Knowledge Modules

These new methods are setNbInsert, setNbUpdate, setNbDelete, setNbErrors and setNbRows, if you look at the above image you will see boxes next to “No. of Rows”, “No. of Updates” etc, using the methods will set the value in these boxes.

The format of each of the methods is the same and you just have to pass a value into it, so say you wanted to use the setNbInsert method the code would be

OdiRef.setNbInsert(<value>) e.g. OdiRef.setNbInsert(10) would set the number of inserts to the value of 10

Now these methods are available it is easy to update any of the Hyperion KMs to use them though remembering at present the KMs only return values for number of records inserted and number of records rejected.

Updating the Report Statistics KM to the above will now set the Number of records inserted, number of records rejected and the total number of records processed. It is also possible to remove the “Ignore Errors” selection, as no exception will be raised anymore.

So now when you run your integration the statistics will be recorded correctly.

I am sure in future releases of the KMs the statistics step will be updated but it is also useful to know what is actually happening and having the ability to fix it.

Sunday, 1 March 2009

ODI Series – Essbase/Planning – Automating changes in data - Part 2

In part 1 I went through simple journalizing, today I am going to look at using consistent set journalizing. Consistent set journalizing tracks changes to a group of the model’s datastores, taking into account the referential integrity between the datastores.

I am going to use the CDC on two tables that hold planning dimension information; one table has the parent/child/alias details and a primary key of ID.

The second table holds the member properties.

The referential integrity is set on the ID column meaning a record cannot be inserted into the table unless the ID exists in the member’s table.

The idea of today’s example is that I am going to capture updates to these tables and then update planning with the changes, the principle of using consistent journalizing is pretty much the same as using simple so I will assume that you have read part 1 so I don’t have to repeat too much.

A new model was created and the above two tables reversed.

This time the consistent set option was chosen under the Journalizing tab

Next step was to add both the tables to use CDC, right click over the datastore and select.

Now when using a group of datastores you have to make sure that the order of the tables is set up correctly, for instance in my example the members table must be before the properties table to preserve the referential integrity.

If you click on the organize tables icon in the journalized tables tab it will set up the order by basing it on the tables foreign keys.

The subscriber can now be created just like when working with simple journalizing.
Once this is applied a session runs to create the subscriber database table, also four extra tables are created these tables are not created when using simple journalizing.



The other two tables created are SNP_CDC_SET_NAME & SNP_CDC_OBJECT, these tables are not populated until the journal is started.

Now the journal can be started by right clicking the model > Changed Data Capture > Start Journal

The session creates the tables, views and triggers required for journalizing.

The table SNP_CDC_OBJECT holds the information about what has been created.

As you can see triggers are created on the two tables that are going to be monitored for changes (HIER_MEMBERS and HIER_PROPERTIES)

Tables are created which will hold the information about the changes (J$HIER_MEMBERS & J$HIERPROPERTIES)

The window_id field relates to the information stored in the SNP_CDC_SET table, the ID field relates to the ID in the monitored table.
When a change happens to the monitored tables, the trigger is fired and this table will be populated with the ID of the record.

Views are also created which will display any changes that have occurred.

The process of how the changes are tracked in the database tables is a lot more complicated and involved than when using simple journalizing, I am not going to go into the depths of that detail and concentrate more on how to set the process in ODI.

The next stage is to set up the interface that will load the changes into planning, the two datastores are dragged onto the source, and the join between the two tables is automatically created. Just like with simple journalizing you need to tick the “Journalized data only box” which creates three extra journal fields to the datastore (JRN_SUBSCRIBER, JRN_FLAG, JRN_DATE)

The automatically generated filter needs updating to change the name of the subscriber to the one you are using.

I also added to the filter so only inserts/updates to the table are retrieved; the rest of the interface is set up just like you would for any of loading metadata to planning.

After inserting a record into the monitored datastores you can view the changes from ODI using “Changed Data Capture” > Consumption > “Extend Window”

This will execute a session that computes all the changes to the datastores.

Selecting to display the data uses the SQL view to retrieve any records where updates have occurred.

Executing the interface will load any updates straight to planning, to reset the journal capturing after you have run the interface can be achieved by using “Changed Data Capture” > Consumption > “Purge Journal”

This is fine but you need to be able automate using the “Extend Window” and Purge functionality; this can be performed in a package.

Before I move on to the package I duplicated the interface and removed the properties table, updated the filter to retrieve records where the flag = ‘D’ which means it would retrieve journalized records that have been deleted, on the target I added ‘Delete Level 0’ to the operation column. This interface will now remove any members from planning where they have been removed from the journalized datastore.

Right, on to the package, the first step is just like I showed in Part 1 by using the ODI tool “OdiWaitForLogData

The only difference being that with consistent set journalizing you can’t use the “Table Name” parameter and have to use the CDC Set parameter.

In the CDC Set parameter it needs to be the name of your logical schema and model name, this can be retrieved by using an ODI API function, the code needs to be set as :-

<%=odiRef.getObjectName("L","model_code","logical_schema", "D")%>

In my example the logical schema is called CDC_PLANNING and the model CDC_Planning so the API function required to be updated to :-

Now to perform the equivalent function of “Extend Window”, drag the model onto the diagram and set the type to “Journalizing Model” and check the “Extend Window” box.

The two interfaces that load any updates and remove members are dragged onto the diagram.

Now that the planning loads have been run we need to reset the journalizing by using the purge command, this can be done like before by dragging the model on to the diagram but this time making sure the “Purge Window” box is checked.

I thought this would be the end of the story but when the purge was run it didn’t reset the journal so the details were still held in the journal tables (J$HIER_MEMBERS & J$HIER_PROPERTIES) which meant if I ran the process again it would load the same records again.

I will try and explain why this is happening and how to overcome it, if you are not interested in the techie bits then I would skip this section in case you fall asleep.

When you run the command “Purge Journal” it carries out the following steps.

So say in my example I had the values CUR_WINDOW_ID = -1, CUR_WINDOW_ID_DEL= -1, CUR_WINDOW_ID_INS = -1

The table would now hold

The next stage is that it updates the journal tables with the value from CUR_WINDOW_ID_INS

So if you have updated any records then they will exist in the journal table and will be tagged with the value in CUR_WINDOW_ID_INS

I had already inserted a record into the table I was monitoring so the window_id column was updated.

It repeats this process for each journal table and follows a similar process for any deletions but uses the value from CUR_WINDOW_DEL

Ok, so what happens when a purge is run; well the following SQL is run for each journal table.

This time it is deleting any values in the journal table where the window_id value is less than or equal to minimum value in the min_window_id column of the SNP_CDC_SUBS table.

So the logic is delete anything in the journal table where the window_id is less than or equal to -1, this means nothing will be deleted so the purge in this circumstance is doing nothing.

This is where the CDC functions Lock Subscriber and Unlock Subscriber come into play.
If you run the Lock Subscriber function and select the subscriber it carries out the following.

The question mark relates to the subscriber that was chosen, it runs some jython code that populates the question mark with subscriber info.

It is updating the SNP_CDC_SUBS table with values from the SNP_CDC_SET table, so after it has completed the SNP_CDC_SUBS table in my example looks like :-

Running the purge now would still not result in any records being deleted because the MIN_WINDOW_ID value has not changed.

So finally lets see what happens when the Unlock Subscriber function is run.

This updates the SNP_CDC_SUBS table and sets the MIN_WINDOW_ID value to the value of MAX_WINDOW_ID_INS which results in

So now there is a value of 0 in the MIN_WINDOW_ID column and running the purge will this time delete records from the journal table

In the journal table the WINDOW_ID column is 0 and the purge command logic will delete anything which is less than or equal to the MIN_WINDOW_ID value (0)

This is what the documentation has to say about “Lock Subscribers”

Although the extend window is applied to the entire consistency set,
subscribers consume the changes separately. This operation performs a subscriber(s) specific
"snapshot" of the changes in the consistency window. This snapshot includes all the changes
within the consistency window that have not been consumed yet by the subscriber(s). This
operation is implemented using a package step with the Journalizing Model Type. It should
be always performed before the first interface using changes captured for the subscriber(s).

Basically if you want to capture changes to the subscriber you must use this function to be able to capture correctly or you will end up like I did the first time I tried to purge the journal.
I thought I would cover what happens in case the same thing happens to you.

Back to the package, I updated “Capture_Changes” to include “Lock Subscribers” for the subscriber I was using

I updated “Reset_Journalizing” to include the “Unlock Subscribers” function and added the subscriber.

Finally I generated a scenario and added the OdiStartScen utility to the diagram, this executes the scenario again which is a much more cleaner and sensible way than just looping back to the first step, if you loop straight back it can cause problems such as the logs growing out of control. I regenerated the scenario so it took into account the last step.

Once the scenario is executed it will wait until 2 records have been updated and then run the journalizing process that captures the records, then the interfaces are run to load metadata into planning, finally the journals are purged and the process will be started again.

I know that I could of created just once interface to handle both the updates and deletes to planning by using the inbuilt functions to change the operation parameter but I just wanted to highlight what could be achieved if you wanted to split loads out.

Well that’s me done for today, time for a bit of a break I think.