More to life...

Tuesday, 8 October 2019

EPM Cloud - Integration Agent Part 3 - Extracting data

Back again to continue with the series looking at the EPM Cloud Integration Agent. In the last part I went through getting up and running with the agent in asynchronous and synchronous modes.

Now that the agent is running, it is time to look at our first integration. In this post I will go through extracting data from a remote relational database and loading to EPM Cloud.

The first example will be against an Oracle database, but I will also cover SQL Server in the next post. The good news is you are not restricted to these two database types, you have the power to connect to any source.

Before starting out, you need to be sure that you can connect to the source database from the machine running the agent. I simply confirmed this by telnetting to the Oracle database server and port.

For simplicity, I have created a database view which queries the tables and data I am interested in and produces the desired output. It doesn’t have to be a view, data can be extracted using a SQL query against the source tables.

The next stage is to create a SQL query in Data Integration that will be executed by the agent to extract the data.

Go to Data Integration > Actions > Query

Add a new query.

Enter a new name for the query and the SQL statement.

You will notice that I have ~SCENARIO~ in the SQL statement. This is a bind variable and must be enclosed with a tilde, this allows you to filter the data extract. It is possible to set the value either at data load rule or application level. These can also be updated using custom event scripting and I will provide an example in the next part.

The most important part about the SQL statement is the column names. They will have to match with what is created in the data source which I will get on to shortly.

It is possible to use alias names for the column in the SQL query.

Once the query has been created it is possible to edit, delete or create additional queries.

The next step is to create a comma separated file with one line containing all the possible columns that can be returned from various SQL queries. The file I have created has more columns than the SQL query above, this is because I intend to have additional queries that return different data sets.

As I just mentioned it is important that the names match to those that will be returned from the SQL queries.

The name of the file will become the name of the target application in Data Management. I know it is confusing calling it a target application when it is a source.

The mindset that you need to be in is to treat the integration as a file-based integration even though you are connecting to a relational source. The agent will execute the SQL query against the source database, extract the data and create a file, this file will contain column headers that match the columns in the SQL query. The file will be automatically uploaded to the cloud and imported where the column headers in the file will be matched to what has been defined in the cloud.

It should become clearer as I go through the full process.

To create the Data Source, go to Data Management as there is not yet full parity in Data Integration.

Go to Target Application > Add > Data Source

From the dropdown select “On Premise Database”, this doesn’t mean the database must be on-premise, it can cloud or on-premise if the agent has good connectivity to it.

This will open the file selection window. Upload the file with the column names and select it.

The file name now has been converted into the application name. A prefix can be added if there is already an application with the same name.

An application will be created with dimensions based on the header columns in the selected file.

I know it is a little confusing because they are really source columns and not dimensions.

The target dimension class should be left as “Generic” and no “Data Table Column Name” assignments are required. If your SQL extracts are going to include additional columns, then you can manually add them at any point to the application.

Next on to the Application Filters tab which provides options to configure the connection to the source database.

Data Extract Query – This is the name of the SQL query to execute. It is possible to define this at application or rule level.

Delimiter – This will be the delimiter that is used when the agent creates the data extract file.

Credential Store – This provides the option to either store database connection information in the cloud or in a file. If “Cloud” is selected the connection information is entered in the application filters. If File is chosen, a file will need to be created in the agent’s application data config directory in the same location as the agent initialisation file. The database user’s password in the file will require to be encrypted using the “encryptpassword” script.

For this example, I am going to go for the “Cloud” option and will cover “File” in the next example.

JDBC Driver – This is the driver for connecting to the source database. Currently there is only “Oracle” and “Microsoft SQL Server”, but I am sure this will increase in future releases. This doesn’t mean you can connect to only those database types though, with the use of event scripting you can connect to pretty much any source.

JDBC URL – This is the database connection string. The Oracle syntax I will be using is

jdbc:oracle:thin:@dbhost:port/service

Username/Password are for the user you want to connect to the database with.

Fetch Size – This is the number of records that the JDBC driver fetches with each call to the database when executing the SQL query.

This is my completed set of property values.

In my SQL query I was filtering on scenario and included a bind variable. To create a bind variable, select “Edit”.

This will then display all the application filters.

A new filter is added for the bind variable. The name must match to that in the SQL query. The prompt will be displayed at the display level, which can be application/rule.

The display order should be a value which is not already being used. It will be the order they are displayed in the UI. The validation type should be set to “None”.

I also updated the display level for the data exact query to be “Application and Rule”. This will mean you can override the query defined at the application in the data load rule.

The integration can now be built up as if building a standard file-based one.

The delimiter should match what has been set in the application filters.

To keep things simple, I have used expressions to map the source values directly to the target so no need for member mappings.

The data load rule includes the bind variable and data extract query as they were set at rule level.

I set scenario to “Actual” which will mean when the SQL query is executed this value will be used as the bind variable.

The equivalent in Data Integration is in Options and part of the Filters tab.

Now we are ready to run the data load rule. The agent is currently running in asynchronous mode and polling the cloud every 30 seconds.

The import stage for the integration is run.

In process details you will see the process is running.

In the process log you can view the database extract process.

10:00:56,078 INFO [AIF]: ************************************************************************************
10:00:56,078 INFO [AIF]: *On Prem DB Extract: Started *
10:00:56,078 INFO [AIF]: ************************************************************************************
10:00:56,203 DEBUG [AIF]: Adding DB credentials
10:00:56,204 DEBUG [AIF]: Adding DB bind variables
10:00:56,205 DEBUG [AIF]: Adding DB query
10:00:56,207 INFO [AIF]: Original Extract Query:SELECT ENTITY,PRODUCT,ACCOUNT,AMOUNT
FROM V_ACTDATA
WHERE SCENARIO=~SCENARIO~
10:00:56,207 DEBUG [AIF]: Adding delimiter
10:00:56,208 DEBUG [AIF]: Adding fetchSize
10:00:56,212 INFO [AIF]: Retrieved EPM Cluster name:EPMAGENT_ASYNC
10:00:56,213 INFO [AIF]: Retrieved Cluster Mode:ASYNC
10:00:56,216 INFO [AIF]: Calling agent extract ASYNC mode: BEGIN
10:00:56,216 INFO [AIF]: Request Payload Length:469
10:00:56,231 DEBUG [AIF]: Updating process steps: QUEUED
10:00:56,234 DEBUG [AIF]: Agent Status: QUEUED

The database credentials, SQL query and bind variable, delimiter and fetch size are collated. This information will be passed to the agent once it requests the information.

The cluster name and mode are retrieved, the process is queued until the agent makes the next poll to the Cloud.

The agent makes a poll to the Cloud and as a process has been initiated, the response from the Cloud will return the Process (Job) ID.

As a Process ID has been returned the agent now makes a request to the Cloud for the details of the job. The information collated earlier is returned to the agent (all encrypted).

The agent will connect to the source database, execute the SQL query with the bind variable value, create an extract data file, upload the file to the Cloud, and upload the process log.

The details of this process are contained in the agent process log. The log includes the SQL query and the bind variable value.

10:01:06:086 :: *************** Starting execution for sessionId:945 at: Oct-06 10:01:06:086***************
10:01:06:086 :: Process Extract Request with data:--------------Executor Request Data--------------
jobType:EXTRACT
sessionID:945
Data Source Name:EPMAGENT_ORA_DEMO
credStore:CLOUD
query:SELECT ENTITY,PRODUCT,ACCOUNT,AMOUNT
FROM V_ACTDATA
WHERE SCENARIO=~SCENARIO~
outputType:FILE
delimiter:,
--------------Bind Variables--------------
Name: SCENARIO, Value:Actual, Type:STRING
--------------------------------------------
10:01:06:086 :: --------------- Executing Custom Event :BefExtract ---------------
10:01:09:539 :: Script: C:\EPMAgent\appdata\scripts\BefExtract.py Not found.
10:01:10:148 :: Adding String Bind Variable: SCENARIO with value: Actual
10:01:10:227 :: Writing extract output to file: C:\EPMAgent\appdata\data\945.dat
10:01:10:258 :: --------------- Completed execution for sessionId:945 ---------------
10:01:10:258 :: --------------- Executing Custom Event :AftExtract ---------------
10:01:10:258 :: Script: C:\EPMAgent\appdata\scripts\AftExtract.py Not found.
10:01:10:258 :: --------------- Executing Custom Event :BefUpload ---------------
10:01:10:258 :: Script: C:\EPMAgent\appdata\scripts\BefUpload.py Not found.
10:01:10:258 :: Commencing file upload to cloud for file:C:\EPMAgent\appdata\data\945.dat
10:01:11:602 :: File Upload successfully complete
10:01:11:602 :: --------------- Executing Custom Event :AftUpload ---------------
10:01:11:602 :: Script: C:\EPMAgent\appdata\scripts\AftUpload.py Not found.
10:01:12:148 :: Commencing file upload of log file:C:\EPMAgent\appdata\logs\945.log
10:01:12:242 :: *************** End execution for sessionId:945 ***************

The agent has been started with the Jython home variable populated.

As this has been populated the agent will attempt to run the four event scripts. I have not created any event scripts yet so no events will be executed. I will be starting to introduce event scripts from the next post.

If the Jython home property was left blank the agent would not attempt to run the scripts.

If the Groovy home property was defined, the agent would attempt run four Groovy event scripts.

The process log is generated in the application data logs folder along with the main agent log. The format for log is .log

The agent process log is uploaded to the Cloud and appended to the main process log.

INFO [AIF]: Appending EPM agent log to process: BEGIN
INFO [AIF]: Appending EPM agent log to process:: END

The data extract file is generated in the application data folder. The format for the data file is .dat

The extract file will include column headers.

The column headers are matched in the Cloud. This is why it is important that the columns returned by the SQL query are the same as the ones defined in the application.

DEBUG [AIF]: EPMFDM-140274:Message - Datasource Column Position in header: AMOUNT -> 4

The remaining steps of the import and validate process will be same as file-based integration.

For each cluster there will be additional queue and process information available in Data Integration.

If the agent is running in synchronous mode, the only difference would be the Cloud makes a direct call to the agent through the Web URL to execute the job. There, process ID is sent directly to the agent so there is no waiting for a poll.

So now we can extract data from a remote Oracle database and load this to EPM Cloud. This is all done using default functionality and no custom scripting.

In the next part I will extract data from a SQL Server database and start to look at event scripts.

Sunday, 6 October 2019

EPM Cloud - Integration Agent Part 2 - Get up and running

Moving swiftly on to the next part in this series on the EPM Cloud Integration Agent. In the last part I went through an introduction to the agent and in this post, I am going to dive straight in and go through the process of getting the agent up and running.

I will be going through an example on a Windows machine, but the concept will be similar on Linux/Unix type operating systems.

A summary of the tasks is:

Install JRE 8+
Install Groovy or Jython (optional)
Download Agent
Extract Agent
Create application folders
Generate encrypted password for EPM Cloud
Populate agent initialisation file
Copy EPM Cloud SSL certificate to agent folder
Start agent

The agent requires a Java Runtime Environment (JRE) 8+ to run. JRE 1.8 update 221 was downloaded and installed.

If you want to use custom scripting and override default functionality in the form of event scripts, then either Groovy or Jython can be installed. If you want to stick with the default functionality and no scripting, then there is no need to install. It is also possible to implement your own Java classes without the need for Groovy/Jython.

I am not going to say which to install as it is personal preference or maybe company policy that defines this. If you work with on-premise FDMEE then you are probably going to be more used to Jython, for this reason I am going to install Jython but will no doubt cover both in future blog posts.

Currently the latest version of Jython is 2.7.1 and can be downloaded here.

To install Jython, the jar file can simply be executed, and the installation steps followed.

Next, the agent needs to be downloaded from Data Integration.

Go to Data Exchange.

Select Actions and there will be an option to download the agent.

Once downloaded, the zip file simply needs to be extracted to a suitable location.

The bin directory is where you need to start to configure and run the agent. There are scripts available for both Windows or *nix type platforms.

“createAppFolder” – will create the application folder structure, it can be generated within the same or separate location to the agent. It is possible to create multiple application folders if you want to run more than one agent and keep them segregated.

For simplicity, I am going to create the application folders in the same location as the agent in a directory named “appdata”.

Running the script will create the following structure.

The config directory contains an initialisation file for the start parameters of the agent, this will need populating before starting the agent, I will get on to that shortly.

The data directory will hold the data files that are created after extracting the source data. The files will be uploaded to EPM Cloud as part of the integration process. You will need to consider housekeeping to maintain this directory over time.

The logs directory will contain a log for the agent and each process, the process log also gets uploaded to EPM Cloud and appended to the main process log.

I will be covering the data and logs in the next part when I look at extracting data.

The scripts directory can contain any of the four available event scripts. As I mentioned earlier, event scripts are optional and once again I will be covering them in a later post.

Back to the agent initialisation file, which from now on I will refer to as ini file.

The file will be automatically populated with the EPM_AGENT_HOME and EPM_APP_DATA_HOME property values.

The EPM_APP_DATA_HOME directory is the one created earlier with the “createAppFolder” script.

AGENT_NAME is a unique name for the agent and can only contain alphanumeric characters. It is optional to define it in the file as it can set at runtime.

CLUSTER defines the name that the agent belongs to. Multiple agents can belong to a cluster and there can be one or many clusters. Like the agent name it can only contain alphanumeric characters.

Clusters provide a way of organising integrations. It may be that you want to use a cluster for only certain integration types, for example one cluster for extracting data from an Oracle database and another for connecting to other cloud providers. You can assign rules, locations or target applications to clusters. I plan on covering this in more detail in a future post.

Clusters are created in Data Management and can either be defined to operate in synchronous or asynchronous modes. I gave an overview of these modes in the previous post.

A cluster must exist before starting the agent. By default, the ini file will be populated with a cluster named “EPMCLUSTER”, this cluster also pre-exists in Data Integration.

Just for demo purposes I am going to have two clusters, one for each type of mode.

To create, view, edit and delete clusters, first go to Data Exchange.

Then select Agent.

There will already be “EPMCLUSTER” created, which has been assigned to asynchronous mode. I deleted this cluster, but it is your choice if you want to keep and use it.

Now a cluster can be added.

Two clusters were created, one for each mode.

I am going to keep it simple and configure the agent in asynchronous mode first, then update the ini file and restart the agent in synchronous mode.

I could have easily had two ini files, one for the agent in asynchronous and the other for synchronous.

Moving on to the next property in the ini file.

PORT is the port the agent will run on and the default is 9090.

CLOUD URL, DOMAIN and credentials should be easy to understand.

You will, however, need to encrypt the users’ password with “encryptpassword” script in the agent bin directory.

The encrypted password must be manually copied to the ini file. It is worth noting that every time the cloud password expires or is changed, the password will need to be encrypted again and the ini file updated.

It is a shame that the encryption method is not the same as EPM Automate. This means if you are running EPM Automate on the same machine as the agent, you will need to manage multiple encrypted passwords. In a perfect world, the encrypted password could be held in one place and be shared between the agent and EPM Automate.

On to the remaining properties.

LOG_LEVEL can be either ALL, INFO or ERROR.

POLL_INTERVAL is for asynchronous mode only, it is the time interval in seconds that the agent checks whether a process has been initiated in the cloud.

REGISTER_WITH_IP defines whether to register the agent against the host name or IP address.

EXECUTION_POOL_SIZE is the number of executions that the agent can run in parallel.

JYTHON_HOME and GROOVY_HOME is optional, they specify the installation location. Earlier I installed Jython to C:\jython2.7.1\ so this is used for the JYTHON_HOME parameter value.

CUSTOM_MEM_ARGS sets the minimum and maximum JVM memory sizes. The default values are just a guide and can be tuned after monitoring memory usage from running integrations.

CUSTOM_JAVA_OPTIONS allows you to add additional Java parameters, for example if the agent needs to go through a proxy to reach EPM Cloud.

CUSTOM_CLASS_PATH provides the option to add additional jar files to Java classpath, for example additional JDBC drivers.

CUSTOM_INTERFACE_CLASS_NAME is only required if custom Java classes are implemented to override default functionality through events.

Here is an example of my completed ini file.

There is one remaining task required before being able to start the agent. The EPM Cloud SSL certificate needs to be exported and copied to the agent cert folder.

I am going to use an example using Chrome, but it is a similar process for other browsers.

Log into one of your EPM Cloud instances and click the padlock icon.

Select Site settings > Certificate

This will open the certificate

Click the Details tab and select ‘Copy to File’.

Save the file to the agent cert folder.

Please be aware that the SSL certificates will expire so make a note of the valid date, you will need to replace it close to the expiration date or the agent will stop working.

Now we are ready to run the agent in asynchronous mode.

Update 19.12 release - The agent can now be run as a windows service. I have put together a post about this and you can read all about it here.

If you want to start the agent from command line, then you can use the “epmagent” script in the agent bin directory.

The syntax is:

epmagent <path to ini>

if the agent name is not defined in the ini file or you want to override it, you can set the agent name at runtime with the following syntax:

epmagent <agent name> <path to ini>

When the agent starts up, the initialisation file and SSL certificate will be read. The agent will be registered in the cloud against the cluster.

If you go back into Data Integration and select the cluster you will see the agent has been registered.

The Physical URL will be the machine and port the agent is running on.

In asynchronous mode the Web URL is not important, this only comes into play with synchronous mode, which I will get on to shortly.

As the agent is running in asynchronous mode, it will poll the cloud to check whether any processes have been initiated. I had set the interval poll to be 30 seconds.

Each time a poll is made the “Last Ping” will be updated in the Cloud. At least this gives some indication that agent status is healthy.

The agent name can be defined at start-up which will override the ini file.

There will now be two agents registered against the cluster even though only one is running.

Agents will stay registered against a cluster, if they are no longer required, they can be simply deleted from the cluster in Data Integration.

You might be wondering whether there is an option available to run the agent as a Windows service. I have raised this and apparently it is being looked into, so it might appear in a future release.

Let us move on to synchronous mode. I am just going to update the ini file to change the cluster name. I could have run both the agents at the same time, but I will be covering multiple agents, clusters and assignments in more detail in a future post.

I restarted the agent so it will pick up the new cluster name.

You can see that the agent mode is now synchronous.

In Data Integration the agent will be registered.

In the last post I provided an overview of the two modes so please refer to that for more details, but as the agent is running in synchronous mode the cloud requires an inbound connection over HTTPS to the agent. This would usually be achieved by running a load balancer or https with an internet facing URL which will proxy to the agent and port.

In my testing, I have a domain name routing to an AWS application load balancer. The Load Balancer has an SSL certificate applied which is registered against the domain name. The load balancer has been configured to only accept HTTPS traffic from Oracle Cloud, this is then proxied to the machine running the agent. The machine running the agent only accepts traffic on the agent port from the Load Balancer.

As you can see, when the agent is first registered with the cloud the Web URL is the same as the Physical URL. The Web URL can be updated by typing the internet facing URL for the agent.

It would be good if there was an option to also set the Web URL from the agent side when it starts up. It is possible to update using REST which will be covered in a separate blog post.

At this point there is no option to check whether EPM Cloud can connect to the agent. It is not until the point when you run an integration that you will find out, though to be honest depending on the problem there is not a great amount of detail in terms of logging. Personally, I think there should be an option to test the agent and check the connectivity. Maybe one for a future release?

It is possible to put the internet facing URL for the agent into a browser and check whether you get a valid response, this doesn’t guarantee that it will work from the cloud though.

You really need to decide whether you need synchronous over asynchronous. Are you happy to wait for polling the interval which can be set or do you have to run processes instantly? Synchronous mode will certainly require working closely with a network team.

Anyway, now that we have the agent running it is time to move on to extracting data. Watch out for the next part arriving very soon.

More to life...

Tuesday, 8 October 2019

EPM Cloud - Integration Agent Part 3 - Extracting data

Sunday, 6 October 2019

EPM Cloud - Integration Agent Part 2 - Get up and running

Blog Archive

It's not about me