Free BI Solution Architecture

April 18, 2022

Business Inteligence

Read in 3 minutes

Nowadays a wide number of companies understand the need to be data-oriented and to exploit the generated data to extend their successfulness.

However, some companies still have no clue of the power of data and might be frigid to invest considerable amounts of money on data management systems with no guarantee of benefits. For these companies, offering a full free open-source data management can be a nice way to demonstrate the added value of business intelligence systems. We built a complete open-source free data management system and we’re going to show you how easy it can be.

Data Integration

First-of-all, we use Talend Open Studio for Data Integration(TOS) as our ETL to gather data from different sources (databases, API, flat files, …) and to integrate these data into our Datawarehouse.

Figure 1: Example of sourcing job

TOS is used both to get data and to integrate it into Datamarts. The sourcing and storing flows are distinct and executed independently. To improve support, every part of the process can be executed alone.

Figure 2: example of storing metajob

Database

A database is necessary to store all the data produced by the data gathering and integration processes built in Talend jobs. We chose PostgreSQL database. This choice was driven by its high application-suitability because Pentaho’s repository database can be a PostgreSQL DB as well as for its extended SQL language.

Collaboration

As the BI solution put in place grows day after day, the BI team working on this solution is also growing day after day. To be able to work together on the same solution, we need to have a collaboration tool that allows us to save works on a shared repository. The tool we’re using is GIT. GIT allows us to save multiple types of files, from documentation to ETL jobs, including Reports definition file so that we’re always working on the latest versions of each file without having to ask questions every time to the team.

Orchestration

It’s important to have jobs able to gather/treat information and meta-jobs to combine them. It’s also important to have a way to schedule these meta-jobs, in the right order, on a certain frequency. This is called Orchestration and the tool we use is GoCD.

GoCD allows us to schedule our meta-jobs, built from Talend, at a certain time of the day.

Figure 3: defining the scheduling

Basically, GoCD is used thanks to Pipelines. One Pipeline is composed of several Stages that are executed one after the other.

Figure 4: list of stages

Our Pipeline is linked to a specific GIT repository and each stage take a specific job, inside this GIT repository, based on several variables beforehand defined and execute it in a specific environment.

Figure 5: link to GIT repository

Figure 6: tasks’s content

Figure 7: stages’s variables

Figure 8: pipeline’s variables

Exploitation

Finally, we exploit our data using some of Hitachi Vantara’s Pentaho solutions. Basic reports are built using Pentaho report designer (PRD), which is a so-called “What You See, What You Get” tool. These reports data is built using custom SQL queries as data sources for example.

Figure 9: PRD user interface

The reports can then be generated from Pentaho User Console (which manages users and reports) or scheduled on a fixed time basis and sent by email.

Figure 10: example of report

We also use Pentaho Community Dashboard Editor (CDE) to create dashboards. These dashboards can be accessed using Pentaho User Console or can be web integrated.

The last Pentaho solution we use is Mondrian. It helps us to create multidimensional cubes. Those cubes can thereafter act as data sources for CDE Dashboards or PRD Reports or Excel sheets for instance.

Conclusion

In conclusion, to build a free BI Solution Architecture, we used the following stack:

  • Open Studio for Data Integration
  • PostgreSQL Database
  • GIT Versioning
  • Go Continuous Delivery
  • Pentaho Report Designer
  • Pentaho Community Dashboard Editor

SHARE ON :


Related articles

March 25, 2024

Read in minutes

Getting started with the new Power BI Visual Calculations feature!

Power BI’s latest feature release, Visual Calculations, represents a paradigm shift in how users interact with data.      Rolled ...

February 20, 2024

Read in 5 minutes

Revolutionizing Data Engineering: The Power of Databricks’ Delta Live Tables and Unity Catalog

Databricks has emerged as a pivotal platform in the data engineering landscape, offering a comprehensive suite of tools designed to tackle the complexities of d...

November 20, 2023

Read in 5 minutes

AKABI’s Consultants Share Insights from Dataminds Connect 2023

Dataminds Connect 2023, a two-day event taking place in the charming city of Mechelen, Belgium, has proven to be a cornerstone in the world of IT and Microsoft ...


comments

Purge ODI Logs through Database

May 29, 2020

Business Inteligence Data Integration

Read in 3 minutes

If you generate a lot of logs in ODI, purging through ODI built-in mechanism can be very slow. A lot faster to do it through Database, but you have to respect foreign keys. Here is a sample plsql script to do so.

Here is a simple script with one parameter which is the number of days of log you want to keep, it will there retrieve session number and delete in the logs table following the dependencies.

SHARE ON :


Related articles

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...

May 28, 2024

Read in minutes

Enhancing Real-Time Data Processing with Databricks: Apache Kafka vs. Apache Pulsar

In the era of big data, real-time data processing is essential for organizations seeking immediate insights and the ability to respond swiftly to changing marke...

March 25, 2024

Read in minutes

Getting started with the new Power BI Visual Calculations feature!

Power BI’s latest feature release, Visual Calculations, represents a paradigm shift in how users interact with data.      Rolled ...


comments

Clean ODI Scenario with Groovy

September 27, 2019

Business Inteligence Data Integration

Read in 1 minutes

You may generate a lot of scenarii when developping ODI projet. When promoting, commiting to git… …you are usually only interested in the last functionnal scenario.

All the past being stored in git or promoted, you may like to clear all all scenarii. If yes, this groovy script may help you. It will delete all scenarii but the last version (sorted by version name, take care…).

The code has two parameters, the project code and a pattern for the package. May help you target specific scenario.

//Imports core
import oracle.odi.core.persistence.transaction.ITransactionDefinition;
import oracle.odi.core.persistence.transaction.support.DefaultTransactionDefinition;
import oracle.odi.core.persistence.transaction.ITransactionManager;
import oracle.odi.core.persistence.transaction.ITransactionStatus;

//Imports odi Objects
import oracle.odi.domain.project.OdiPackage;
import oracle.odi.domain.runtime.scenario.OdiScenario;
import oracle.odi.domain.project.finder.IOdiPackageFinder;
import oracle.odi.domain.runtime.scenario.finder.IOdiScenarioFinder;


// Parameters -- TO FILL --
String sourceProjectCode = 'MY_PROJECT_CODE';
String sourcePackageRegexPattern = '*';


println "    Start Scenarios Deletion";
println "-------------------------------------";

//Setup Transaction
ITransactionDefinition txnDef = new DefaultTransactionDefinition();
ITransactionManager tm = odiInstance.getTransactionManager();
ITransactionStatus txnStatus = tm.getTransaction(txnDef);

int scenarioDeletedCounter = 0;

try {
  //Init Scenario Finder
  IOdiScenarioFinder odiScenarioFinder = (IOdiScenarioFinder)odiInstance.getTransactionalEntityManager().getFinder(OdiScenario.class);
  //Loops through all packages in target project/fodlers
  for (OdiPackage odiPackageItem : ((IOdiPackageFinder)odiInstance.getTransactionalEntityManager().getFinder(OdiPackage.class)).findByProject(sourceProjectCode)){
    // Only generate Scenario for package matching pattenr
    if (!odiPackageItem.getName().matches(sourcePackageRegexPattern)) {
      continue;    
    }
    println "Deleting Scenarii for Package " + odiPackageItem.getName();
    
    odiScenCollection = odiScenarioFinder.findBySourcePackage(odiPackageItem.getInternalId());
    maxOdiScen = odiScenCollection.max{it.getVersion()};
    if (maxOdiScen != null) {
      for (OdiScenario odiscen : odiScenCollection ) {
        if (odiscen != maxOdiScen){
          println "Deleting Scenari "+ odiscen.getName() + " " + odiscen.getVersion();
          odiInstance.getTransactionalEntityManager().remove(odiscen);
          scenarioDeletedCounter ++;
        }
      }
    }
 }   
// Commit transaction
tm.commit(txnStatus);


println "---------------------------------------------------";
println "     " + scenarioDeletedCounter + " Scenarios deleted Sccessfully";
println "---------------------------------------------------";

} 
catch (Exception e)
{
  // Print Execption
  println "---------------------ERROR-------------------------";
  println(e);
  println "---------------------------------------------------";
  println "     FAILURE : Scenarios Deletion failed";
  println "---------------------------------------------------";
}

SHARE ON :


Related articles

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...

May 28, 2024

Read in minutes

Enhancing Real-Time Data Processing with Databricks: Apache Kafka vs. Apache Pulsar

In the era of big data, real-time data processing is essential for organizations seeking immediate insights and the ability to respond swiftly to changing marke...

March 25, 2024

Read in minutes

Getting started with the new Power BI Visual Calculations feature!

Power BI’s latest feature release, Visual Calculations, represents a paradigm shift in how users interact with data.      Rolled ...


comments

Human and Machine Learning

March 17, 2019

Business Inteligence Event

Read in 2 minutes

I had the opportunity to attend the 2019 Gartner Data & Analytics Summit at London. Here is a wrap up of some notes I took during the sessions.

Few years ago, AI was a subject of fear for the future. Now it’s a fact, Machine Learning is part of the present. We are not anymore in a challenge Humans vs Machines, goal is to free human resources for higher end tasks. Humans and Machines…

You still have a problem with terms like Artificial Intelligence, Machine Learning? No worries, just replace them with “Augmented“.
Augmented Analytics, Augmented Data Management, Augmented Data Integration…

2019 will be Augmented. Not Human versus Machine but Human and Machine Learning at the service of a better Data World.

The new tools will let you operate as you used too but, in the background, will run Machine Learning algorithm to suggest you new vizualisations, unexpected facts, correlations, to save you from repetitive task…

  • All your integration flows have a common pattern, your augmented tool will detect it and propose you to create a new template automatically.
  • You select a set of analytics, your augmented tool will propose a cool vizualisation.
  • You want to prepare a dataset, your augmented analytics will automatically suggest formatting corrections, data mapping and learn from your choices.

If you plan to buy a new tool this year, be sure this is part of the roadmap.

Any other trends for 2019?
Many other trends were presented by Gartner, here are a couple of recurring ones during the sessions :

  • NLP. Natural Language Processing, new tools should be able to accept natural language as input (which allow vocal input from Alexa, Cortona…).
  • DataOps. No-one will deny Data is a subject where requirements evolve quickly. This is thus a choice area to apply agile development methods. DataOps is a specialized version of DevOps practices. This fits perfectly in an augmented world where most repetitive tasks should be automated.

On a non-technical side :

  • Data Literacy. Being a good technician is not enough if you work in the data world. You need to understand data, how they are and can be presented. Your ability to communicate around the data is as important as your ability to manage them. This is what include the data literacy skills. Some training exists on the web, a must for any consultant.

And many more you can find on Gartner web site or at future events.

Enjoy 2019 with machines.13 Rue de la Libération, 5969 Itzig, Luxembourg

SHARE ON :


Related articles

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...

March 25, 2024

Read in minutes

Getting started with the new Power BI Visual Calculations feature!

Power BI’s latest feature release, Visual Calculations, represents a paradigm shift in how users interact with data.      Rolled ...

February 20, 2024

Read in 5 minutes

Revolutionizing Data Engineering: The Power of Databricks’ Delta Live Tables and Unity Catalog

Databricks has emerged as a pivotal platform in the data engineering landscape, offering a comprehensive suite of tools designed to tackle the complexities of d...


comments