Tuesday, 31 July 2012

ODI Series – Standalone Agent High Availability using OPMN

There are a few options around to offer high availability with ODI agents and the usual route is to deploy and cluster two or more J2EE agents,  these would then be fronted by a HTTP server.  Using this method allows for load balancing of the active/active agents and the failover of the scheduler as only one agent should be the scheduler.

There are situations where you may not choose to go down the J2EE route as it adds in more complexity and are looking for a simpler solution  but still require some form of high availability,  for instance keeping in my tradition of the EPM world you may have scheduled ODI routines to build/ load data to essbase databases using a standalone agent, if the standalone agent goes down and is unable to restart you would like a method to keep the schedule intact and not lose out on the important loads.

A possible method to achieve this could be to control standalone agents using OPMN and configure OPMN to allow failover between the agents, in this scenario the agents would operate in an active/passive configuration.

I am going to go through the process of getting up and running with this concept in my usual way, the first step is to get OPMN installed on the machines which are going to host the standalone agents, in the EPM world this may be on the essbase server which makes  life a little easier as OPMN will be already installed and ready to configure, I did go through the steps in an earlier blog.

The prerequisite is that an ODI 11g standalone agent has been installed on two machines.

I am going to assume that OPMN is not installed and go through the whole process, now I know there is a OBE available on configuring OPMN to manage odi agents but I think it is outdated and informs to install an old version of OPMN.

The easiest way to get OPMN installed is download Oracle Web Tier utilities, version 11.1.1.6 is the latest version available at the time of writing this and can be download from here.

 

Select the OS, download and extract.

In the extracted structure there will be a folder called Disk1 and there execute the file setup.exe which will start up the installer.


I am not going to go through every step and stick with the ones that are important.


Select “Install and Configure”



Select a location and home directory, ignore any warning about an application being required.


To allow OPMN to be configured you will need to select only “Oracle HTTP Server” even though it is not required but don’t worry the configuration of OHS in OPMN can be removed at a later stage which will make it redundant.


Enter a path for the OPMN instance home, the default will be <WebTier_Home>\instances\instance1  I updated to be ODI instead of instance1

The OPMN Instance Name default is instance1 which I updated to ODI_Instance

The OHS Component Name can be ignored, it is just the name that will be used for OHS in OPMN.


If installing on windows a windows service will be created for OPMN.


If you take a look at the processes on the machine then you will notice that OPMN is running and it has also started OHS (which is basically apache HTTP server)

From command line you can check the status OPMN


The command line tool is available in
<WebTier_Home>\instances\<instance_name>\bin


As we are not interested in OHS then it can be removed from being controlled by OPMN


A component can be deleted with the command
opmnctl deletecomponent –componentname <component_name>

OPMN is now installed on the first machine which will be hosting the primary standalone agent, the same process is now repeated on the second machine which I don’t need to cover.

I have an ODI standalone agent called StandAloneAgent already installed on two machines ODIAGENT and ODIAGENTPASS so the next step is to configure them to use OPMN

To do this you need to edit agentcreate.properties in <ODI_HOME>\oracledi\agent\bin which is populated with default values.


I am not going to go into too much detail about updating the file as I covered that in a previous blog but here is a quick overview

ODI_MASTER_DRIVER
ODI_MASTER_URL
ODI_MASTER_USER
ODI_MASTER_ENCODED_PASS
ODI_SECU_WORK_REPO
ODI_SUPERVISOR
ODI_SUPERVISOR_ENCODED_PASS


These variables can be populated with the information held in odiparams.bat


INSTANCE_HOME
ORACLE_OPMN_HOME

These can be updated with the information provided when installing Web Utilities but make sure the path separator is entered as / in both windows and unix.

COMPONENT_NAME
PORTNO
JMXPORTNO

These variables are for the agent information which is available from the Studio or the agent start script.


A completed agentcreate.properties file would look something to similar to the one above.

To add the agent information to OPMN then there is a script provided in the same directory called odi_opmn_addagent.bat


The file will require editing before running as the OPMN_HOME and INSTANCE_NAME variables will require updating with the correct paths.


Executing odi_opmn_addagent.bat should add the agent to OPMN


You can view the agent through the OPMN command line tool, after it has been added the status will show as Down meaning the agent has not been started, the agent can be started using

opmn  startproc ias-component=<Agent_Name>

The status should be then displayed as Alive though this doesn’t guarantee the agent has started up with any errors.


The logs are located in
<WebTier_Home>\instances\<instance_name>\diagnostics\logs\OPMN\opmn



If you look at the processes running on the machine you should the agent java process being controlled by OPMN.

One issue that can occur is that the user specified for the ODI_SUPERVISOR variable in the agentcreate.properties is ignored and defaulted to supervisor in the OPMN configuration file.


If a different user than supervisor is being used then opmn.xml in <WebTier_Home>\instances\ODI\config\OPMN\opmn can be updated, any changes to this file require a restart of OPMN.

Once the agent is up and running without any issues then the same configuration can be replicated on the second machine.

This now means that both agents are being controlled OPMN so the next step is to configure OPMN for failover.



Edit the opmn.xml file and if you are not intending to use SSL communication between the two OPMN nodes then set <ssl enabled=”false”, if you are going to use SSL then the wallet file will need to be recreated on each machine.


Add in the topology and nodes list containing each of the OPMN hostnames, usually you would use the fully qualified name.


Remove numprocs=”1” from the <process-set id=”odi-agent” section


At the <process-type  id=”odiagent” section add in

service-failover=”1”  which will enable the failover functionality

service-weight="<value>" this defines which agent has priority, a higher value means higher priority.


On the second machine the opmn.xml configuration would be exactly the same except for the service-weight value.


There are additional configuration settings available but that is enough to get going with failover, after changes have been made the OPMN service should be restarted.


After starting the OPMN processes on both of the machines the agent should be active on one of the nodes, on the passive node the agent’s status should be down and if the agent is restarted on the passive node OPMN should check if there is an active agent process, if there is then the agent process will not start.


If the agent is stopped or crashes and cannot start (I believe the default is three attempts) then it should failover and become active on the other node, the agent will start as a scheduler so any future scheduled jobs should be honoured.

If the scheduler has already started a job which has repeat cycles and the agent fails over then the session and repeat cycles will not be run on the new scheduler agent.

So all good the agents are working as expected in an active/passive configuration, well there is on slight issue if you look at the configuration of the agent then you will notice the hostname is set as the active agent at the time.


When the agent fails over then the host in the agent configuration becomes invalid, this could be handled by updates to the DNS, VIP, hosts file entries or there is another method that keeps the host updated which I am sure not everybody will agree with.

OPMN has the ability to run event scripts and one of the available options is to execute a pre-start script, so it is possible to run a script just before an agent is started.


In this example prior to the agent starting the updateAgent script is executed.


The script sets the agent host name to host that is executing the script in the ODI master repository snp_agent table, this as basic as it can be but the type of scripting and complexity to be used is really down to your preference.

So let’s test a failover with the pre script added.


The agent is active on ODIAGENT before failover.


The agent fails over to node ODIAGENTPASS and successfully updates the host configuration, please note the ODI Studio does need restarting after a failover to refresh the hostname.

The event scripts could be expanded to add in logging or alerting email functionality.

If you are looking to implement high availability with ODI agents and don’t want to go down the J2EE route then this method is certainly worth investigating, if you want any further information then feel free to contact me.

2 comments:

José Hernandez said...

Hi,

What happend with parameters:

PROXY_PORT and MASTER_REPO_EXTERNAL_ID


Thanks

John Goodwin said...

Hi, these parameters did not exist when the blog was written.
Try the documentation - http://docs.oracle.com/cd/E28280_01/install.1111/e16453/opmn.htm#CACIEFAA