Today is going to be a quick blog on how to create user functions as it is something I have not covered in the past and they can be quite useful, if you have used user functions then this is probably going to be no use to you and you may as well do something far more interesting instead.
User functions allow you define a customised function that can be used in interfaces or procedures, the function can be used across different technologies. They are useful when you use a manually created function across many interfaces; it gives you the benefit that you don’t have to keep typing all the code into each interface.
They work on the same principle as functions in other applications or programming languages, you pass variables into the function, the variables are used to say calculate a value and in the end a value is passed back from the function to the caller.
A function can be created as global so it is common to all projects or it can be defined at project level.
I am going to go through an example of a user function I had to create recently and go the process of how it was defined, it is nothing complex but I find useful.
Here is the conundrum, I have a standard variable that is a text data value e.g. 01/06/09 (UK date format), and this variable is declared when a scenario is executed. I need to generate a numeric value corresponding to the month of the variable and then use it in a calculation. If the month is April then I need to return 1, if the month is March then I need to return 12. Just to add to that the current month to be used in the function is always the prior month of the month entered in the variable, so if it is May in the variable then I want to use April in the function.
This example is based on using Oracle technology.
Before I created the function I had the following code placed in a mapping column on the target Datastore of the interface
So what it is basically doing is converting ‘01/06/09’ to a date format, it is using the function add_months to get the previous month, it then converts the date back to a string, it then uses the CASE function to convert the month to a numerical value.
As I didn’t want to hard code the date I used a variable to hold the current month.
So it could of ended up like
This is fine but it was a piece of code that I was going to use again and again so the best way to handle this was by using a user function.
In this example I am creating a project based function, it is exactly the same process to create a global variable though it is created by going to the Others tab in the Designer.
The fields are :-
Name: Name of the user function e.g. ReturnMonthVal
Group: Group of the user function. If you type a group name that does not exist, a new group will be created with this group name when the function is saved.
Syntax: Syntax of the user function that will appear in the expression editor; The arguments of the function must be specified in this syntax.
The format of the syntax is
<Function Name>($(Variable),$(Variable2)….)
As I am only passing in one value the format is
ReturnMonthVal($(MonthVar))
It doesn’t have to be called MonthVar it can be named anything that is meaningful.
In the Implementations tab click add, now you can enter the code for the function, so I just pasted the code I had used earlier in the interface mapping, the date had to be updated to the variable name defined in the first Syntax screen.
As this example is using Oracle then it was chosen under the Linked Technologies, if the code was acceptable in its current format for other technologies then they could also be chosen.
If the function was going to be used on different technologies and required different code then this can easily be done by just adding a new implementation, entering the code and applying it to a technology.
To use the function in an interface mapping then it can be achieved with
The objective was to pass a variable to the function so the variable created earlier can be used.
Remember though if you are going to use a variable you will need to declare it say in a package, now the function can be used throughout various integrations without having to rewrite the code each time
So there you have it a quick simple example of how to define and put into practice a user function.
A delve into the disturbing world of EPM, uncovering the mysteries and facts beneath the shiny surface, yawn your way through yet another blog, let's hope this one is interesting...
Sunday, 28 June 2009
Sunday, 14 June 2009
ODI Series – Processing all files in a directory
There have been a number of occasions where it has been necessary to process all files in a directory, the files are all of the same structure but could be for different weeks or months and all need to be processed in one session.
Now there are probably a number of ways going about this and I know it might not be your preferred route (just thought I would say it before I get the usual messages from the purists :) ) but this is a method that I believe is worth sharing and it is one I can look back upon once I forget how it is done.
The object of this example is to pick up all the files in a directory and load them into a database table.
An example of the file structure is as follows
The first step is to create a new Datastore holding the structure of one of the files (remember all files must be of the same structure)
I have pointed the Datastore directly to one of the files at the moment, this will be replaced later by using a variable, as we need to load files with different names.
In this example I am loading into a database table so I have already created the table to hold the data and reversed it.
Ok, so now we need an interface to load the data from the file straight into the db table.
Test it works and populates the table.
So now we can load the data we just have to loop through all the files.
Next is to create a variable that will store the current filename.
And now go back to the file Datastore and place the variable name as the resource name, remember variables in ODI start with #
Remember the variables are case sensitive as well.
So this means when the variable is updated with a filename it will load that file in the interface. Though it is important to know that you won’t just be able to run the interface now as the variable has to be declared and this is usually done through the use of a package.
Lets create the package then, all that is required is to declare the variable and then execute the interface to load the data from the file.
Once that has been applied we also need a scenario generated so we call the package when required, right click the package > “Generate Scenario” and accept the default configuration values.
Next we want generate a list of the files that need to be processed and to do this we can use an O/S command, I am working on a windows O/S so you will need to change the command a little if you are using a different O/S.
What this command does is to list all the files in the specified directory and writes the output to a file in another directory, I have used a different directory to write the list to, as I want to keep it separated from the other files that will be processed.
The extra parameters /b means it will run in bare format mode so there are no heading or summary info.
/a:-d means use an attribute of no directories so no sub directory information will be written to the file.
Running the command produces this file.
To be able run this command in ODI a new procedure was created
In the details tab a new command was inserted using the technology “Operating System” and the command pasted into the command window.
If you don’t want to hard code directory names then it is possible to use variables.
You can test whether the procedure produces the file by executing it.
The next thing to do is created a new Datastore pointing to the file just created; it will just contain one column for the filenames.
Ok, now we want to be able to read in all the filenames, store current filename in the filename variable and then call the package to load the data passing in the value of variable.
To do this we need to add another command step to the procedure that was created earlier.
In the Source what we want to do is retrieve all the filename information from the Datastore we just created, this means File technology was selected and the schema that points to the correct directory location.
In the command area we want to enter the command to retrieve the information, I originally thought I could just use SQL like the following,
SELECT loadfile FROM Filenames
When I tried to execute it I got an error about invalid format description, now I am not totally sure if I went about this correct way but what I did was create a quick interface to load the file information into a db table.
I executed the interface and then checked the operator to find what code was generated to retrieve the data from the file.
As you can see ODI generates extra information between /* */ which all the information about the DataStore but in ODI terminology.
So what I did was lift this information and put it into the procedure command window. I am sure this can be achieved by one of ODIs substitution methods but I am not clear which one could be used, maybe I will get some suggestions?
So this will retrieve all the filenames record by record and what we want to do on the target is call the scenario to load the file and pass in the current value of “loadfile”
To do this we can use one of the ODI tools OdiStarScen
So what is happening here is the Scenario “LOADSUMMARYFILES” is executed and the value passed into the variable currFileName is from the current value held on the source “loadfile”
When using a variable make sure you include the project name as well e.g. DEV.currFileName
So executing the procedure will retrieve a filename and then use that filename in the interface to load the data from the file into the db table, it will move on to the next filename and repeat the process until it has processed all the filenames.
You can easily add the procedure into another package to carry out further processes or make it more sophisticated by checking for when the files arrive and then processes them and deleting/archiving them when complete.
Now there are probably a number of ways going about this and I know it might not be your preferred route (just thought I would say it before I get the usual messages from the purists :) ) but this is a method that I believe is worth sharing and it is one I can look back upon once I forget how it is done.
The object of this example is to pick up all the files in a directory and load them into a database table.
An example of the file structure is as follows
The first step is to create a new Datastore holding the structure of one of the files (remember all files must be of the same structure)
I have pointed the Datastore directly to one of the files at the moment, this will be replaced later by using a variable, as we need to load files with different names.
In this example I am loading into a database table so I have already created the table to hold the data and reversed it.
Ok, so now we need an interface to load the data from the file straight into the db table.
Test it works and populates the table.
So now we can load the data we just have to loop through all the files.
Next is to create a variable that will store the current filename.
And now go back to the file Datastore and place the variable name as the resource name, remember variables in ODI start with #
Remember the variables are case sensitive as well.
So this means when the variable is updated with a filename it will load that file in the interface. Though it is important to know that you won’t just be able to run the interface now as the variable has to be declared and this is usually done through the use of a package.
Lets create the package then, all that is required is to declare the variable and then execute the interface to load the data from the file.
Once that has been applied we also need a scenario generated so we call the package when required, right click the package > “Generate Scenario” and accept the default configuration values.
Next we want generate a list of the files that need to be processed and to do this we can use an O/S command, I am working on a windows O/S so you will need to change the command a little if you are using a different O/S.
What this command does is to list all the files in the specified directory and writes the output to a file in another directory, I have used a different directory to write the list to, as I want to keep it separated from the other files that will be processed.
The extra parameters /b means it will run in bare format mode so there are no heading or summary info.
/a:-d means use an attribute of no directories so no sub directory information will be written to the file.
Running the command produces this file.
To be able run this command in ODI a new procedure was created
In the details tab a new command was inserted using the technology “Operating System” and the command pasted into the command window.
If you don’t want to hard code directory names then it is possible to use variables.
You can test whether the procedure produces the file by executing it.
The next thing to do is created a new Datastore pointing to the file just created; it will just contain one column for the filenames.
Ok, now we want to be able to read in all the filenames, store current filename in the filename variable and then call the package to load the data passing in the value of variable.
To do this we need to add another command step to the procedure that was created earlier.
In the Source what we want to do is retrieve all the filename information from the Datastore we just created, this means File technology was selected and the schema that points to the correct directory location.
In the command area we want to enter the command to retrieve the information, I originally thought I could just use SQL like the following,
SELECT loadfile FROM Filenames
When I tried to execute it I got an error about invalid format description, now I am not totally sure if I went about this correct way but what I did was create a quick interface to load the file information into a db table.
I executed the interface and then checked the operator to find what code was generated to retrieve the data from the file.
As you can see ODI generates extra information between /* */ which all the information about the DataStore but in ODI terminology.
So what I did was lift this information and put it into the procedure command window. I am sure this can be achieved by one of ODIs substitution methods but I am not clear which one could be used, maybe I will get some suggestions?
So this will retrieve all the filenames record by record and what we want to do on the target is call the scenario to load the file and pass in the current value of “loadfile”
To do this we can use one of the ODI tools OdiStarScen
So what is happening here is the Scenario “LOADSUMMARYFILES” is executed and the value passed into the variable currFileName is from the current value held on the source “loadfile”
When using a variable make sure you include the project name as well e.g. DEV.currFileName
So executing the procedure will retrieve a filename and then use that filename in the interface to load the data from the file into the db table, it will move on to the next filename and repeat the process until it has processed all the filenames.
You can easily add the procedure into another package to carry out further processes or make it more sophisticated by checking for when the files arrive and then processes them and deleting/archiving them when complete.