Sunday, 14 June 2009

ODI Series – Processing all files in a directory

There have been a number of occasions where it has been necessary to process all files in a directory, the files are all of the same structure but could be for different weeks or months and all need to be processed in one session.

Now there are probably a number of ways going about this and I know it might not be your preferred route (just thought I would say it before I get the usual messages from the purists :) ) but this is a method that I believe is worth sharing and it is one I can look back upon once I forget how it is done.

The object of this example is to pick up all the files in a directory and load them into a database table.



An example of the file structure is as follows



The first step is to create a new Datastore holding the structure of one of the files (remember all files must be of the same structure)



I have pointed the Datastore directly to one of the files at the moment, this will be replaced later by using a variable, as we need to load files with different names.

In this example I am loading into a database table so I have already created the table to hold the data and reversed it.



Ok, so now we need an interface to load the data from the file straight into the db table.



Test it works and populates the table.



So now we can load the data we just have to loop through all the files.
Next is to create a variable that will store the current filename.



And now go back to the file Datastore and place the variable name as the resource name, remember variables in ODI start with #
Remember the variables are case sensitive as well.



So this means when the variable is updated with a filename it will load that file in the interface. Though it is important to know that you won’t just be able to run the interface now as the variable has to be declared and this is usually done through the use of a package.

Lets create the package then, all that is required is to declare the variable and then execute the interface to load the data from the file.



Once that has been applied we also need a scenario generated so we call the package when required, right click the package > “Generate Scenario” and accept the default configuration values.



Next we want generate a list of the files that need to be processed and to do this we can use an O/S command, I am working on a windows O/S so you will need to change the command a little if you are using a different O/S.



What this command does is to list all the files in the specified directory and writes the output to a file in another directory, I have used a different directory to write the list to, as I want to keep it separated from the other files that will be processed.

The extra parameters /b means it will run in bare format mode so there are no heading or summary info.

/a:-d means use an attribute of no directories so no sub directory information will be written to the file.

Running the command produces this file.



To be able run this command in ODI a new procedure was created



In the details tab a new command was inserted using the technology “Operating System” and the command pasted into the command window.



If you don’t want to hard code directory names then it is possible to use variables.
You can test whether the procedure produces the file by executing it.

The next thing to do is created a new Datastore pointing to the file just created; it will just contain one column for the filenames.



Ok, now we want to be able to read in all the filenames, store current filename in the filename variable and then call the package to load the data passing in the value of variable.

To do this we need to add another command step to the procedure that was created earlier.



In the Source what we want to do is retrieve all the filename information from the Datastore we just created, this means File technology was selected and the schema that points to the correct directory location.

In the command area we want to enter the command to retrieve the information, I originally thought I could just use SQL like the following,

SELECT loadfile FROM Filenames

When I tried to execute it I got an error about invalid format description, now I am not totally sure if I went about this correct way but what I did was create a quick interface to load the file information into a db table.



I executed the interface and then checked the operator to find what code was generated to retrieve the data from the file.



As you can see ODI generates extra information between /* */ which all the information about the DataStore but in ODI terminology.

So what I did was lift this information and put it into the procedure command window. I am sure this can be achieved by one of ODIs substitution methods but I am not clear which one could be used, maybe I will get some suggestions?



So this will retrieve all the filenames record by record and what we want to do on the target is call the scenario to load the file and pass in the current value of “loadfile”

To do this we can use one of the ODI tools OdiStarScen



So what is happening here is the Scenario “LOADSUMMARYFILES” is executed and the value passed into the variable currFileName is from the current value held on the source “loadfile”
When using a variable make sure you include the project name as well e.g. DEV.currFileName

So executing the procedure will retrieve a filename and then use that filename in the interface to load the data from the file into the db table, it will move on to the next filename and repeat the process until it has processed all the filenames.



You can easily add the procedure into another package to carry out further processes or make it more sophisticated by checking for when the files arrive and then processes them and deleting/archiving them when complete.

6 comments:

Deca said...

Hi John. Really nice blog, only a comment.

if you have different target you have to customize the lkm

konda.....rocks.... said...

Hi John,

Iam new to this blogspot but i know about you earlier and i have seen very good posts..

Here is a question its not related to the post above.

Iam struggling to resolve from past one week.

Iam loading an ASO cube using Essbase integration services. Iam ending up with a data load buffer error.could you explain what load buffer is..

Thu Jun 25 12:10:34 2009 /IS//0x0/1245947191/Error/1003007/Build-AIS93100B113

EssbaseAPI: Data Value [0] Encountered Before All Dimensions Selected, [0] Records Completed

Thu Jun 25 12:10:34 2009 /IS//0x0/1245947191/Error/1270040/Build-AIS93100B113

EssbaseAPI: Data load buffer [3] does not exist

Thu Jun 25 12:10:34 2009 /IS//0x0/1245947191/Informational/0/Build-AIS93100B113

IS Error(EssLoadBufferTerm): Data load buffer [3] does not exist 1270040

Thu Jun 25 12:10:34 2009] /IS//0x0/1245947191/Error/0/Build-AIS93100B113

@@@@@Thread_Target 20 being exited with code 1270040

Thanks,
Ram

Anonymous said...

Hi John ,Still i am not clear with the steps ,how to process files from a folder or directory

Anonymous said...

very good post,

thanks

Anonymous said...

Great post and we have followed something similar and it works very well.
We found an issue when we upgraded to 6.1.0_45 of java on windows platform that the OS command fails to work and throws a Java error.
It works fine on prior java releases for example 6.1.0_37 and if also works using odioscommand as a workaround under _45.
The readme of _45 indicates that there may be issues with application on windows use OS command.

Unknown said...

Hi
Thank you for your great post. It helped me so much.
You are like hero for me. I never forget you.
God bless you.

A.H.Meisami
From Asia, Iran, Tehran