Free BI Solution Architecture

April 18, 2022

Business Inteligence

Read in 3 minutes

Nowadays a wide number of companies understand the need to be data-oriented and to exploit the generated data to extend their successfulness.

However, some companies still have no clue of the power of data and might be frigid to invest considerable amounts of money on data management systems with no guarantee of benefits. For these companies, offering a full free open-source data management can be a nice way to demonstrate the added value of business intelligence systems. We built a complete open-source free data management system and we’re going to show you how easy it can be.

Data Integration

First-of-all, we use Talend Open Studio for Data Integration(TOS) as our ETL to gather data from different sources (databases, API, flat files, …) and to integrate these data into our Datawarehouse.

Figure 1: Example of sourcing job

TOS is used both to get data and to integrate it into Datamarts. The sourcing and storing flows are distinct and executed independently. To improve support, every part of the process can be executed alone.

Figure 2: example of storing metajob

Database

A database is necessary to store all the data produced by the data gathering and integration processes built in Talend jobs. We chose PostgreSQL database. This choice was driven by its high application-suitability because Pentaho’s repository database can be a PostgreSQL DB as well as for its extended SQL language.

Collaboration

As the BI solution put in place grows day after day, the BI team working on this solution is also growing day after day. To be able to work together on the same solution, we need to have a collaboration tool that allows us to save works on a shared repository. The tool we’re using is GIT. GIT allows us to save multiple types of files, from documentation to ETL jobs, including Reports definition file so that we’re always working on the latest versions of each file without having to ask questions every time to the team.

Orchestration

It’s important to have jobs able to gather/treat information and meta-jobs to combine them. It’s also important to have a way to schedule these meta-jobs, in the right order, on a certain frequency. This is called Orchestration and the tool we use is GoCD.

GoCD allows us to schedule our meta-jobs, built from Talend, at a certain time of the day.

Figure 3: defining the scheduling

Basically, GoCD is used thanks to Pipelines. One Pipeline is composed of several Stages that are executed one after the other.

Figure 4: list of stages

Our Pipeline is linked to a specific GIT repository and each stage take a specific job, inside this GIT repository, based on several variables beforehand defined and execute it in a specific environment.

Figure 5: link to GIT repository

Figure 6: tasks’s content

Figure 7: stages’s variables

Figure 8: pipeline’s variables

Exploitation

Finally, we exploit our data using some of Hitachi Vantara’s Pentaho solutions. Basic reports are built using Pentaho report designer (PRD), which is a so-called “What You See, What You Get” tool. These reports data is built using custom SQL queries as data sources for example.

Figure 9: PRD user interface

The reports can then be generated from Pentaho User Console (which manages users and reports) or scheduled on a fixed time basis and sent by email.

Figure 10: example of report

We also use Pentaho Community Dashboard Editor (CDE) to create dashboards. These dashboards can be accessed using Pentaho User Console or can be web integrated.

The last Pentaho solution we use is Mondrian. It helps us to create multidimensional cubes. Those cubes can thereafter act as data sources for CDE Dashboards or PRD Reports or Excel sheets for instance.

Conclusion

In conclusion, to build a free BI Solution Architecture, we used the following stack:

  • Open Studio for Data Integration
  • PostgreSQL Database
  • GIT Versioning
  • Go Continuous Delivery
  • Pentaho Report Designer
  • Pentaho Community Dashboard Editor

SHARE ON :


Related articles

November 4, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 2 

Welcome to the second part of our Dataminds Connect 2024 recap! After covering the first two days of this event in our initial article, we’re excited to share...

October 18, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

The Dataminds Connect 2024 event, held in the picturesque city of Mechelen, Belgium, is a highly anticipated three-day gathering for IT professionals and Micros...

March 25, 2024

Read in minutes

Getting started with the new Power BI Visual Calculations feature!

Power BI’s latest feature release, Visual Calculations, represents a paradigm shift in how users interact with data.      Rolled ...


comments