Sunday, 16 April 2017

EPM Cloud - masking data

One of the new features in release 17.04 is the ability to mask data using the EPM Automate utility which also means it should be available through the REST API, the release documentation provides the following information:

“A new version of the EPM Automate Utility is available with this update. This version includes the maskData command, which masks all application data to ensure data privacy. You use this command only on test instances to hide sensitive data from users; for example, application developers, who should not access your business data.”

While the documentation says only to use to only on test instances it should be possible to run the command on a production instance but the reality is you probably wouldn’t want to do that anyway.

The command is available for FCCS, PBCS, ePBCS and TRCS

It is worth pointing out before using the command is that it will update all data and make it meaningless so make sure you have a backup like a maintenance snapshot and run it against the correct instance as there is no going back.

The EPM Automate command syntax is:

epmautomate maskdata –f

-f is optional and suppresses the user input to confirm whether to run the command, you would only really use the parameter if you are going to be automating the process of masking data.

So let’s give it a test drive and run the command.

The output from the command gives us a little more of an insight to what is going on behind the scenes, as this is Software as a Service we rarely get to understand the full story of what is happening.

The process flow is for each database in the application:
  • Export all data in masked format
  • Clear all data from cube
  • Load masked data back into the cube
This applies to all cubes within the application including ASO, for BSO it is a full export so depending on the size of the databases the command could take a while to complete, I will go into more detail on how I believe the masking of data is being achieved shortly.

The following form shows an example of the data before running the mask data command.

After running the command the data has been masked and is now totally meaningless.

At the start of this post I said that the functionality should be available through REST, if you are not aware the EPM Automate utility is basically built on top of REST so if the functionality is in the utility then it should be possible to access it through REST.

The URL format for the REST resource is:


I can replicate this with a REST client using a POST method.

The above response includes a URL which can then be accessed to check the status of the data masking.

A status of -1 means the process is running and a status of 0 indicates the process has completed, any other value would mean the process has failed.

For automation purposes this can be easily scripted in your preferred language, I have put together an example PowerShell script that calls the mask data REST resource and then keeps checking the status until the process has completed.

So how is the masking of data achieved, well I know some people believe that if you put the word cloud into any sentence then some kind of magic occurs but unfortunately I can’t always accept that, the clue was there when running the EPM Automate “maskdata” command:

“This command will export the data in masked format”

In the on-premise world a new feature was introduced into MaxL and the following is taken from the Essbase new features readme.

“The MaxL export data statement includes grammar you can use to make exported data anonymous, wherein real data is replaced with generated values. This removes the risk of sensitive data disclosure, and can be used in case a model needs to be provided to technical support for reproduction of certain issues.”

The Essbase tech ref provides additional information on how the data is masked, for BSO

“Export data in anonymized format. Anonymization removes the risk of sensitive data disclosure, and can be used in case sample data needs to be provided for technical support. Essbase replaces real data values with incremental values beginning with 0, increasing by 1 for each value in the block.”

If I take the Sample Basic BSO database for simplicity I can demonstrate what is happening in the cloud using MaxL.

The above example is a full data export using the anonymous syntax and the output shows how the cells in each block have been replaced with incremental values.

I know the basic database usually has scenario as a dense dimension but I updated it to sparse for this example.

Each block in the database will have the same incremental values including upper level blocks which means the aggregated values for stored upper level members will be incorrect.

For on-premise if you wanted the data to be aggregated correctly you could run the anonymous export for level 0, clear, load and then aggregate. For the cloud, you don’t have that control so you could run the mask data command, clear upper level data and then aggregate with a business rule.

A spreadsheet retrieve highlights the difference before and after a masking the data.

Moving on to ASO which uses a slightly different approach to masking the data, the documentation provides further details on this:

“Export data in anonymized format. Anonymization removes the risk of sensitive data disclosure, and can be used in case sample data needs to be provided for technical support. Essbase replaces real data values with 1, for each value in the block.”

I am not sure I agree with the statement about block and would prefer for it to say input level cell, also the way the data is anonymized is quite primitive but at least it shows a true reflective of where the data did exist.

Anyway, let us take another example of masking the data by using the anonymous syntax against the ASO sample application.

The exported data contains all the level 0 data values replaced with a 1.

Another spreadsheet retrieve shows the difference of before and after anonymizing the data.

I am going to leave it there but hopefully you now understand the concept of how the data mask functionality works, even though it is a cloud command the process can be replicated on-premise and with a greater level of flexibility.

No comments: