Purge ODI Logs through Database

May 29, 2020

Uncategorized

Read in 3 minutes

If you generate a lot of logs in ODI, purging through ODI built-in mechanism can be very slow. A lot faster to do it through Database, but you have to respect foreign keys. Here is a sample plsql script to do so.

Here is a simple script with one parameter which is the number of days of log you want to keep, it will there retrieve session number and delete in the logs table following the dependencies.

SHARE ON :



comments

Clean ODI Scenario with Groovy

September 27, 2019

Business Inteligence IT

Read in 1 minutes

You may generate a lot of scenarii when developping ODI projet. When promoting, commiting to git… …you are usually only interested in the last functionnal scenario.

All the past being stored in git or promoted, you may like to clear all all scenarii. If yes, this groovy script may help you. It will delete all scenarii but the last version (sorted by version name, take care…).

The code has two parameters, the project code and a pattern for the package. May help you target specific scenario.

//Imports core
import oracle.odi.core.persistence.transaction.ITransactionDefinition;
import oracle.odi.core.persistence.transaction.support.DefaultTransactionDefinition;
import oracle.odi.core.persistence.transaction.ITransactionManager;
import oracle.odi.core.persistence.transaction.ITransactionStatus;

//Imports odi Objects
import oracle.odi.domain.project.OdiPackage;
import oracle.odi.domain.runtime.scenario.OdiScenario;
import oracle.odi.domain.project.finder.IOdiPackageFinder;
import oracle.odi.domain.runtime.scenario.finder.IOdiScenarioFinder;


// Parameters -- TO FILL --
String sourceProjectCode = 'MY_PROJECT_CODE';
String sourcePackageRegexPattern = '*';


println "    Start Scenarios Deletion";
println "-------------------------------------";

//Setup Transaction
ITransactionDefinition txnDef = new DefaultTransactionDefinition();
ITransactionManager tm = odiInstance.getTransactionManager();
ITransactionStatus txnStatus = tm.getTransaction(txnDef);

int scenarioDeletedCounter = 0;

try {
  //Init Scenario Finder
  IOdiScenarioFinder odiScenarioFinder = (IOdiScenarioFinder)odiInstance.getTransactionalEntityManager().getFinder(OdiScenario.class);
  //Loops through all packages in target project/fodlers
  for (OdiPackage odiPackageItem : ((IOdiPackageFinder)odiInstance.getTransactionalEntityManager().getFinder(OdiPackage.class)).findByProject(sourceProjectCode)){
    // Only generate Scenario for package matching pattenr
    if (!odiPackageItem.getName().matches(sourcePackageRegexPattern)) {
      continue;    
    }
    println "Deleting Scenarii for Package " + odiPackageItem.getName();
    
    odiScenCollection = odiScenarioFinder.findBySourcePackage(odiPackageItem.getInternalId());
    maxOdiScen = odiScenCollection.max{it.getVersion()};
    if (maxOdiScen != null) {
      for (OdiScenario odiscen : odiScenCollection ) {
        if (odiscen != maxOdiScen){
          println "Deleting Scenari "+ odiscen.getName() + " " + odiscen.getVersion();
          odiInstance.getTransactionalEntityManager().remove(odiscen);
          scenarioDeletedCounter ++;
        }
      }
    }
 }   
// Commit transaction
tm.commit(txnStatus);


println "---------------------------------------------------";
println "     " + scenarioDeletedCounter + " Scenarios deleted Sccessfully";
println "---------------------------------------------------";

} 
catch (Exception e)
{
  // Print Execption
  println "---------------------ERROR-------------------------";
  println(e);
  println "---------------------------------------------------";
  println "     FAILURE : Scenarios Deletion failed";
  println "---------------------------------------------------";
}

SHARE ON :


Related articles

April 18, 2022

Read in 3 minutes

Free BI Solution Architecture

Nowadays a wide number of companies understand the need to be data-oriented and to exploit the generated data to extend their successfulness. However, some comp...

March 17, 2019

Read in 2 minutes

Human and Machine Learning

I had the opportunity to attend the 2019 Gartner Data & Analytics Summit at London. Here is a wrap up of some notes I took during the sessions. Few years ag...

January 28, 2019

Read in 5 minutes

Web service tuning with Talend ESB

In this post, I’ll show you what I did to optimize a REST web service build with Talend ESB using Apache Camel and Apache CXF camel compon...


comments

Human and Machine Learning

March 17, 2019

Business Inteligence Event IT

Read in 2 minutes

I had the opportunity to attend the 2019 Gartner Data & Analytics Summit at London. Here is a wrap up of some notes I took during the sessions.

Few years ago, AI was a subject of fear for the future. Now it’s a fact, Machine Learning is part of the present. We are not anymore in a challenge Humans vs Machines, goal is to free human resources for higher end tasks. Humans and Machines…

You still have a problem with terms like Artificial Intelligence, Machine Learning? No worries, just replace them with “Augmented“.
Augmented Analytics, Augmented Data Management, Augmented Data Integration…

2019 will be Augmented. Not Human versus Machine but Human and Machine Learning at the service of a better Data World.

The new tools will let you operate as you used too but, in the background, will run Machine Learning algorithm to suggest you new vizualisations, unexpected facts, correlations, to save you from repetitive task…

  • All your integration flows have a common pattern, your augmented tool will detect it and propose you to create a new template automatically.
  • You select a set of analytics, your augmented tool will propose a cool vizualisation.
  • You want to prepare a dataset, your augmented analytics will automatically suggest formatting corrections, data mapping and learn from your choices.

If you plan to buy a new tool this year, be sure this is part of the roadmap.

Any other trends for 2019?
Many other trends were presented by Gartner, here are a couple of recurring ones during the sessions :

  • NLP. Natural Language Processing, new tools should be able to accept natural language as input (which allow vocal input from Alexa, Cortona…).
  • DataOps. No-one will deny Data is a subject where requirements evolve quickly. This is thus a choice area to apply agile development methods. DataOps is a specialized version of DevOps practices. This fits perfectly in an augmented world where most repetitive tasks should be automated.

On a non-technical side :

  • Data Literacy. Being a good technician is not enough if you work in the data world. You need to understand data, how they are and can be presented. Your ability to communicate around the data is as important as your ability to manage them. This is what include the data literacy skills. Some training exists on the web, a must for any consultant.

And many more you can find on Gartner web site or at future events.

Enjoy 2019 with machines.13 Rue de la Libération, 5969 Itzig, Luxembourg

SHARE ON :


Related articles

April 18, 2022

Read in 3 minutes

Free BI Solution Architecture

Nowadays a wide number of companies understand the need to be data-oriented and to exploit the generated data to extend their successfulness. However, some comp...

September 27, 2019

Read in 1 minutes

Clean ODI Scenario with Groovy

You may generate a lot of scenarii when developping ODI projet. When promoting, commiting to git… …you are usually only interested in the last functionnal s...

January 28, 2019

Read in 5 minutes

Web service tuning with Talend ESB

In this post, I’ll show you what I did to optimize a REST web service build with Talend ESB using Apache Camel and Apache CXF camel compon...


comments

Web service tuning with Talend ESB

January 28, 2019

IT

Read in 5 minutes

In this post, I’ll show you what I did to optimize a REST web service build with Talend ESB using Apache Camel and Apache CXF camel component. In this basic scenario, the web service read records in database using SQL queries (no JPA) and the response will be marshalled, pretty printed and rendered.

Introduction

For this technical demo, I’ll mount a H2 database for testing purpose and launch some performance test using Apache JMeter, the project source code and the JMeter testing project can be found on my github account.

I just want to see the throughput and the average response time then I’ll try to increase it by adding some extra functionalities (or new features), the aim is to improve throughput at its best and response time to the lower possible value.

So I’ll try these following clues, don’t hesitate to give me your own in the comment section 😉 this post still open for an update.

  • Changing the camel data format for mashaling/unmarshaling the JSON output, indeed, some framework are faster than other in this task, you check this here.
  • Adding new JDBC driver like HirakiCP, according to graphics, not just better than c3p0, Tomcat, vibur, and dbcp2 just amazing.
  • Using a cache management like Ehcache, memory or disk caching hits are faster than database hits.
  • Unfortunately multithreading is useless in this case, if you have an advice on this, send me a comment.

To be clear, the aim is mainly a developer side improvement, just a quick and dirty proof of concept, for example, to be more accurate it’s better to launch the JMeter test in no gui mode and use a production ready RGDB instead of H2, e.g. you can improve performance with specific parameters like MySQL can do with prepareStatement caching capability.

Test scenario with JMeter

As I previously said, I mounted a H2 database with only one table and 4 test records.

Here is my thread group corresponding to 10 users and 10 thousand calls for each of them.

I generate a random id value between 1 and 4.

And finally call the web service with the generated id value.

Creating the Camel route

Here is the Camel route, very simple, nothing more and nothing less.

img5_webservice_tuning_talend_esb
img6_webservice_tuning_talend_esb

Just change the bindingStyle to simple consumer to get the web service URL parameters values into the header.

img7_webservice_tuning_talend_esb

I use the message header to directly parse value into the SQL query you build into the body.

img8_webservice_tuning_talend_esb
img9_webservice_tuning_talend_esb

I finally marshal the SQL result with the Camel data format component.

img10_webservice_tuning_talend_esb

Changing the dataformat

I’ll try three of them, Jackson, XStream and a new challenger named fastjson from Alibaba group existing since the 2.20 of Camel version. Unfortunately I had to create the camel-fastjson-alldep as it’s not already packaged with the Talend 7.1.1 solution, sounds strange but anyway, you can use the Maven assembly plug-in with this dependency

<dependency>

<groupId>org.apache.camel</groupId>

<artifactId>camel-fastjson</artifactId>

<version>2.21.2</version>

</dependency>

Let’s see the result with JMeter

Database connection with Spring and Jackson data format

img11_webservice_tuning_talend_esb

Database connection with Spring and Xstream data format

img12_webservice_tuning_talend_esb

Database connection with Spring and Fastjson data format

img13_webservice_tuning_talend_esb

It seems that fastjson is the winner with 550 request by second with a gain of 20%, so I’ll keep it for the next improvement step.

The response time tends to 17ms, not bad, but let see if we can do better 😉

img14_webservice_tuning_talend_esb

Changing the driver

If you take a look at the HikariCP github page, you’ll see that HikariCP is the perfect challenger and thanks to this good new, there is no source code to change with the camel bean register, only the JDBC connection parameters.

The first attempt was done with the spring transaction manager, as you can see with the bean register component.

See the spring 4 library found on Maven repository, not provided with Talend version 7.

img15_webservice_tuning_talend_esb

The bean register looks like this for the spring connection, I used the SimpleDriverDataSource recommended by red hat if you plan to deploy this route on an OSGI environment.

img16_webservice_tuning_talend_esb

Just add the 143ko library jar you can find here into the cConfig

img17_webservice_tuning_talend_esb

… and change the Camel bean register with the HikariCP parameters, the bean id doesn’t even change 😉

img18_webservice_tuning_talend_esb

Let’s see the result with JMeter

img19_webservice_tuning_talend_esb

Simply by changing the JDBC connection pool, we gain 80% with a throughput score of 995 hits/s

The average response time drops to 9ms, better.

img20_webservice_tuning_talend_esb

Adding a cache manager

Now the route is a little bit more complicated as we have to put and get value from a caching system named Ehcache, apache Camel is able to use it with its dedicated Ehcache component.

For my test, I create a cache on disk located on your user temporary space and a one second time to live, I wanted to simulate a cache expiry in order to make others calls onto the database instead of four calls corresponding to my four poor records.

Changing the Camel route

img21_webservice_tuning_talend_esb

I use a cDirect, the Ehcache endpoint is always the same

img22_webservice_tuning_talend_esb

Add a resource xml file, this file store all the Ehcache parameters:

img23_webservice_tuning_talend_esb

Here is the content file

<?xml version=“1.0” encoding=“UTF-8”?>
<config xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance&#8221;xsi:schemaLocation=http://www.ehcache.org/v3 http://www.ehcache.org/schema/ehcache-core-3.5.xsd&#8221;>
<persistence directory=“${java.io.tmpdir}” />
<cache alias=“dataServiceCache”>
<key-type>java.lang.String</key-type>
<value-type>java.lang.String</value-type>
<expiry>
<ttl unit=“seconds”>1</ttl>
</expiry>
<resources>
<heap unit=“entries”>100</heap>
<disk persistent=“true” unit=“MB”>10</disk>
</resources>
</cache>
</config>

The exchange message header contains keywords you have to customize like CamelEhcacheActionCamelEhcacheKey and CamelEhcacheValue to make the action and store the key/value, for more informations see the documentation page.

img24_webservice_tuning_talend_esb
img25_webservice_tuning_talend_esb

Tips: Define the key value as a String type, more efficient than Object type by default, the throughput will be better.

img26_webservice_tuning_talend_esb

Let’s see the result with JMeter

img27_webservice_tuning_talend_esb

Simply by adding a cache manager, we gain another 50% (400% from first try) with a score of 1972 hits/s

The average response time drops again to 4ms !!!

img28_webservice_tuning_talend_esb

Conclusion

Changing the JDBC driver increase the throughput by 80% and adding a cache manager give me a 400% gain with only 336 hits on the database. Web services are very challenging, as I’m not the ultimate expert, I think again when I start this kind of development. My approach was to merge the best of the examples I could find on the Apache Camel official repository by taking all the best practice. Leave a comment to share your feeling and thanks for reading.

SHARE ON :


Related articles

September 27, 2019

Read in 1 minutes

Clean ODI Scenario with Groovy

You may generate a lot of scenarii when developping ODI projet. When promoting, commiting to git… …you are usually only interested in the last functionnal s...

March 17, 2019

Read in 2 minutes

Human and Machine Learning

I had the opportunity to attend the 2019 Gartner Data & Analytics Summit at London. Here is a wrap up of some notes I took during the sessions. Few years ag...

January 21, 2019

Read in 1 minutes

Docker image for Talend MDM web UI

Installing manually any new web application is still a mess, thanks to Docker we’ll see how to gain leverage in the Talend web user interface installation. I&...


comments

Docker image for Talend MDM web UI

January 21, 2019

IT

Read in 1 minutes

Installing manually any new web application is still a mess, thanks to Docker we’ll see how to gain leverage in the Talend web user interface installation. I’ll describe all the steps to up the web UI in only a few minutes.

Introduction

Fortunately Talend MDM web UI is packaged with an executable jar which can be drive by a response file in an unattended mode. The file is just a script in xml format read by the installer during the installation process, see Talend documentation for further information about the silent mode.

Even if Talend recommends these compatible OS and official Oracle JVM, I used Alpine base image and open source JVM in order to get the lightest image as possible (239Mo) and it works like a charm 😉

img1_akabi_docker_talend_mdm_webui

Installation

The docker image is hosted here, for information, the latest docker version corresponds to the last Talend stable version (7.1.1 released the 26th of October 2018)

Launch docker daemon and use this command to pull the image

Finally, use this other command to run the container

Open your favourite browser and go to the Talend MDM welcome page, the webUI is now ready to receive all Talend MDM objects you’ll push with the open studio client, see my previous post to see how to do.

img2_akabi_docker_talend_mdm_webui.jpg

Hope that helps and thanks for reading 😉

SHARE ON :


Related articles

September 27, 2019

Read in 1 minutes

Clean ODI Scenario with Groovy

You may generate a lot of scenarii when developping ODI projet. When promoting, commiting to git… …you are usually only interested in the last functionnal s...

March 17, 2019

Read in 2 minutes

Human and Machine Learning

I had the opportunity to attend the 2019 Gartner Data & Analytics Summit at London. Here is a wrap up of some notes I took during the sessions. Few years ag...

January 28, 2019

Read in 5 minutes

Web service tuning with Talend ESB

In this post, I’ll show you what I did to optimize a REST web service build with Talend ESB using Apache Camel and Apache CXF camel compon...


comments