Friday 4 October 2019

EPM Cloud - Integration Agent Part 1 - Introduction

One of the problems with data integration in EPM Cloud is that the list of data sources has been quite limited. Unless you want to integrate with an Oracle source like Financials, HCM or NetSuite then you are restricted to generating a file in the correct format and uploading. If there is the option of on-premise FDMEE then the hybrid functionality is a workaround for these limitations. Unfortunately, this is not viable for the majority of EPM Cloud customers.

These limiting factors have almost all been eliminated with the release of the Integration Agent which has arrived in EPM Cloud 19.10.

Here is an excerpt from the 'What’s New' documentation providing an insight to the details of the agent.

“The EPM Integration Agent, which provides connectivity between the EPM Cloud and on-premises sources including SQL data sources like Oracle and Microsoft SQL Server, E-Business Suite GL, and PeopleSoft GL, is now available:

In addition to easy access and setup for these sources, the agent may also be extended using scripting to connect to virtually any data source including 3rd party REST API based systems, or systems that can be accessed using a Jython or Groovy script. The agent, implemented as a data source for Data Management and Data Integration, responds and provides the specified information when a user executes data load rules.  Agent based integrations may be scheduled like any other data rule which provides users the flexibility to define integrations to any source in an automated fashion.”

One thing I disagree with in the first paragraph of the above statement is the reference to on-premise. The SQL data source can be in the cloud or on-premise, it doesn’t matter as long as the agent has good connectivity to it. The same goes for the agent, it can be run in the cloud or on-premise with the basic requirement that it has HTTPS connectivity to EPM Cloud.

The beauty of the agent is that with either Jython or Groovy custom scripting it goes beyond the default functionality of connecting and extracting data from SQL sources. It has opened up a whole new area of possibilities and I know there is going to be a lot of interest in it.

I personally think the release of the agent is one of the biggest enhancements in a long time for EPM Cloud integration.

In this first part I am going to give a quick overview of the agent before moving on to getting up and running with it. There is no point in overloading with too much information at this stage as it will all start to make more sense from the next post. There is a lot to cover so I am unsure how many parts there will be.

The agent is a lightweight Java application which is similar in concept to EPM Automate. As it is Java-based, it can run on most operating systems. Currently it stands at just over 6MB in size. The agent acts as a HTTP server and servlet engine with the use of Jetty, no surprises that the bulk of the functionality is built around a REST API.

In my testing I have been running the agent on a low tier AWS machine which has network connectivity to EPM Cloud and various data sources.

The agent requires a JRE to run which should be version 8+ and is not included in the download. This means either a supported JRE will need to be installed before running the agent, or if you are concerned about Java licensing restrictions and EPM Automate has been installed on the same machine, it is possible to point to the version of Java that is deployed with EPM Automate. I can imagine in a lot of cases the agent will be run on the same machine as EPM Automate.

There are configuration parameters for the agent that are defined in a file, I will be covering this in the next post.

The agent can currently be downloaded from Data Integration but I believe the longer-term plan is to move it into the downloads section but we will see.

By default, the agent will connect to relational data sources using JDBC and execute SQL queries to extract the required data set. A data extract file will be generated which is uploaded to EPM Cloud. The SQL query is defined in Data Integration. The database connection can either be set up in Data Management or in a file stored in the agent config directory.

There is also the option to extract data from EBS and Peoplesoft which are basically pre-seeded SQL queries. Though to be honest it might be better to fine tune the SQL to fit your requirements and use that instead.

There is no requirement to use the custom scripting option if you want to stick with default functionality to extract data. To be able to use custom scripting then either Jython or Groovy will need to be installed on the machine hosting the agent.

If you have used custom scripting in on-premise FDMEE then it is very similar to event scripts. The scripts allow you to override the default functionality and connect to any data source. I will cover this in a future post.

There are four events where custom scripting can be applied:
  • Before extract – Before the agent connects to the data source and executes the SQL query
  • After extract- After the agent has connected to the data source and produced an extract data file
  • Before upload – Before the data file is uploaded to EPM Cloud
  • After upload – After the data file has been uploaded to EPM Cloud.
Processes are executed in the same way from either Data Management, Data Integration, EPM Automate or REST.

It is possible to run multiple agents and cluster agents, once again I will cover this in a later post.

The agent can run in two different modes, the first being:

Asynchronous mode 

This is probably going to be the most common type of deployment as the agent only requires outbound connectivity over HTTPS to EPM Cloud, similar to EPM Automate requirements.

In this mode the agent will poll EPM Cloud using REST to check whether a load rule has been executed. The poll interval is defined in the agent configuration file.

If no rule has been executed the REST response will not return a Job ID, nothing will then happen until the next time the agent polls the cloud.

If a rule has been executed a payload is returned containing the Job ID. The agent makes another REST call with the job ID and the response contains the connectivity information, the SQL data source and the SQL query to execute.

The agent will then connect to the data source, execute the query and generate a data file. This file is uploaded to EPM Cloud using REST.

A REST call is made to inform that the process was successful.

The agent process log file is also uploaded to the cloud and appended to the main process log.

Standard functionality then takes over and the data file will be imported and validated.

Synchronous mode 

This requires EPM Cloud to connect over http(s) to the agent. This would usually be fronted with an internet facing load balancer or HTTP server which directs traffic to the EPM Agent on the designated port. The machine running the EPM Agent would not need to be internet facing and would be configured to only accept traffic from the load balancer/HTTP server.

An example being:

It does not have to follow the above design, there could be just a load balancer or a HTTP server or both. It is even possible to connect directly to the agent but that is not something I would promote for security reasons.

In my testing I ran an AWS application load balancer over HTTPS, this proxied requests to the EPM Agent machine and port. The machine hosting the agent was configured to only accept traffic from the load balancer.

What I will say is currently there is not much in the way of functionality or logging when testing the agent in synchronous mode. If you are considering this option, then you will need to work closely with a network team to configure, and put monitoring in place in case of configuration issues.

In Synchronous mode when a data load rule is executed, a REST request is made from EPM Cloud to the internet facing URL for the agent. The payload of the REST call contains the job ID. The agent then operates in the same way as Asynchronous mode. A request will be made based on the Job ID for the SQL connectivity and query information. The query is executed, file produced, uploaded to the cloud. The process status is updated, and process log file uploaded.

I think the decision of which mode to use really depends on whether you want to go down the route of configuring an internet facing solution and being able to run a process instantly or can wait for the poll interval. Remember the poll interval can be defined at agent level.

I will go into much more detail of how the asynchronous and synchronous modes work in a future post.

Communication between the agent and EPM Cloud is not only secured using HTTPS and TLS but also the REST payloads are encrypted using a public/private key management system.

Each time the agent is started it will generate a private and public key pair.

The private key is stored in the agent’s memory. The public key is uploaded to EPM Cloud and stored.

The REST payloads from EPM Cloud are encrypted using the public key. This can then be decrypted with the private key of the agent.

This means the agent can only be paired with one EPM Cloud instance and any attempt to communicate with the agent without the key and encryption will not work.

With that brief intro covered it is time to move on to the good stuff and get the agent up and running. Part two is available here.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.