AKABI’s Consultants Share Insights from Dataminds Connect 2023

November 20, 2023

Analytics Business Inteligence Data Integration Event Microsoft Azure + 1

Read in 5 minutes

Dataminds Connect 2023, a two-day event taking place in the charming city of Mechelen, Belgium, has proven to be a cornerstone in the world of IT and Microsoft data platform enthusiasts. Partly sponsored by AKABI, this event is a gathering of professionals and experts who share their knowledge and insights in the world of data.

With a special focus on the Microsoft Data Platform, Dataminds Connect has become a renowned destination for those seeking the latest advancements and best practices in the world of data. We were privileged to have some of our consultants attend this exceptional event and we’re delighted to share their valuable feedback and takeaways.

How to Avoid Data Silos – Reid Havens

In his presentation, Reid Havens emphasized the importance of avoiding data silos in self-service analytics. He stressed the need for providing end users with properly documented datasets, making usability a top priority. He suggested using Tabular Editor to hide fields or make them private to prevent advanced users from accessing data not meant for self-made reports. Havens’ insights provided a practical guide to maintaining data integrity and accessibility within the organization.

Context Transition in DAX – Nico Jacobs

Nico Jacobs took on the complex challenge of explaining the concept of “context” and circular dependencies within DAX. He highlighted that while anyone can work with DAX, not everyone can understand its reasoning. Jacobs’ well-structured presentation made it clear how context influences DAX and its powerful capabilities. Attendees left the session with a deeper understanding of this essential language.

Data Modeling for Experts with Power BI – Marc Lelijveld

Marc Lelijveld’s expertise in data modeling was on full display as he delved into various data architecture choices within Power BI. He effortlessly navigated topics such as cache, automatic and manual refresh, Import and Dual modes, Direct Lake, Live Connection, and Wholesale. Lelijveld’s ability to simplify complex concepts made it easier for professionals to approach new datasets with confidence.

Breaking the Language Barrier in Power BI – Daan Lambrechts

Daan Lambrechts addressed the challenge of multilingual reporting in Power BI. While the tool may not inherently support multilingual reporting, Lambrechts showcased how to implement dynamic translation mechanisms within Power BI reports using a combination of Power BI features and external tools like Metadata Translator. His practical, step-by-step live demo left the audience with a clear understanding of how to meet the common requirement of multilingual reporting for international and multilingual companies.

Lessons Learned: Governance and Adoption for Power BI – Paulien van Eijk & Teske van Maaren

This enlightening session focused on the (re)governance and (re)adoption of Power BI within organizations where Power BI is already in use, often with limited governance and adoption. Paulien van Eijk and Teske van Maaren explored various paths to success and highlighted key concepts to consider:

  • Practices: Clear and transparent guidance and control on what actions are permitted, why, and how.
  • Content Ownership: Managing and owning the content in Power BI.
  • Enablement: Empowering users to leverage Power BI for data-driven decisions.
  • Help and Support: Establishing a support system with training, various levels of support, and community.

Power BI Hidden Gems – Adam Saxton & Patrick Leblanc

Participating in Adam Saxton and Patrick Leblanc’s “Power BI Hidden Gems” conference was a truly enlightening experience. These YouTube experts presented topics like Query folding, Prefer Dual to Import mode, Model properties (discourage implicit measures), Semantic link, Deneb, and Incremental refresh in a clear and engaging manner. Their presentation style made even the most intricate aspects of Power BI accessible and easy to grasp. The quality of the presentation, a hallmark of experienced YouTubers, made the learning experience both enjoyable and informative.

The Combined Power of Microsoft Fabric for Data Engineer, Data Analyst and Data Governance Manager – Ioana Bouariu, Emilie Rønning and Marthe Moengen

I had the opportunity to attend the session entitled “The Combined Power of Microsoft Fabric for Data Engineer, Data Analyst, and Data Governance Manager”. The speakers adeptly showcased the collaborative potential of Microsoft Fabric, illustrating its newfound relevance in our evolving data landscape. The presentation effectively highlighted the seamless collaboration facilitated by Microsoft Fabric among data engineering, analysis, and governance roles. In our environment, where these roles can be embodied by distinct teams or even a single versatile individual, Microsoft Fabric emerges as a unifying force. Its adaptability addresses the needs of diverse profiles, making it an asset for both specialized teams and agile individuals. Its potential promises to open exciting new perspectives for the future of data management.

Behind the Hype, Architecture Trends in Data – Simon Whiteley

I thoroughly enjoyed Simon Whiteley’s seminar on the impact of hype in technology trends. He offered valuable insights into critically evaluating emerging technologies, highlighting their journey from experimentation to maturity through Gartner’s hype curve model.

Simon’s discussion on attitudes towards new ideas, the significance of healthy skepticism, and considerations for risk tolerance was enlightening. The conclusion addressed the irony of consultants cautioning against overselling ideas, emphasizing the importance of skepticism. The section on trade-offs in adopting new technologies provided practical insights, especially in balancing risk and fostering innovation.

In summary, the seminar provided a comprehensive understanding of technology hype, offering practical considerations for navigating the evolving landscape. Simon’s expertise and engaging presentation style made it a highly enriching experience.

In Conclusion

Dataminds Connect 2023 was indeed a remarkable event that provided valuable insights into the world of data. We want to extend our sincere gratitude to the organizers for putting together such an informative and well-executed event. The knowledge and experiences gained here will undoubtedly contribute to our continuous growth and success in the field. We look forward to being part of the next edition and the opportunity to continue learning and sharing our expertise with the data community. See you next year!

Vincent Hermal, Azure Data Analytics Practice Leader
Pierre-Yves Richer, Azure Data Engineering Practice Leader
avec la participation très précieuse de Sophie Opsommer, Ethan Pisvin, Pierre-Yves Outlet et Arno Jeanjot

SHARE ON :


Related articles

November 4, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 2 

Welcome to the second part of our Dataminds Connect 2024 recap! After covering the first two days of this event in our initial article, we’re excited to share...

October 18, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

The Dataminds Connect 2024 event, held in the picturesque city of Mechelen, Belgium, is a highly anticipated three-day gathering for IT professionals and Micros...

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...


Free BI Solution Architecture

April 18, 2022

Business Inteligence

Read in 3 minutes

Nowadays a wide number of companies understand the need to be data-oriented and to exploit the generated data to extend their successfulness.

However, some companies still have no clue of the power of data and might be frigid to invest considerable amounts of money on data management systems with no guarantee of benefits. For these companies, offering a full free open-source data management can be a nice way to demonstrate the added value of business intelligence systems. We built a complete open-source free data management system and we’re going to show you how easy it can be.

Data Integration

First-of-all, we use Talend Open Studio for Data Integration(TOS) as our ETL to gather data from different sources (databases, API, flat files, …) and to integrate these data into our Datawarehouse.

Figure 1: Example of sourcing job

TOS is used both to get data and to integrate it into Datamarts. The sourcing and storing flows are distinct and executed independently. To improve support, every part of the process can be executed alone.

Figure 2: example of storing metajob

Database

A database is necessary to store all the data produced by the data gathering and integration processes built in Talend jobs. We chose PostgreSQL database. This choice was driven by its high application-suitability because Pentaho’s repository database can be a PostgreSQL DB as well as for its extended SQL language.

Collaboration

As the BI solution put in place grows day after day, the BI team working on this solution is also growing day after day. To be able to work together on the same solution, we need to have a collaboration tool that allows us to save works on a shared repository. The tool we’re using is GIT. GIT allows us to save multiple types of files, from documentation to ETL jobs, including Reports definition file so that we’re always working on the latest versions of each file without having to ask questions every time to the team.

Orchestration

It’s important to have jobs able to gather/treat information and meta-jobs to combine them. It’s also important to have a way to schedule these meta-jobs, in the right order, on a certain frequency. This is called Orchestration and the tool we use is GoCD.

GoCD allows us to schedule our meta-jobs, built from Talend, at a certain time of the day.

Figure 3: defining the scheduling

Basically, GoCD is used thanks to Pipelines. One Pipeline is composed of several Stages that are executed one after the other.

Figure 4: list of stages

Our Pipeline is linked to a specific GIT repository and each stage take a specific job, inside this GIT repository, based on several variables beforehand defined and execute it in a specific environment.

Figure 5: link to GIT repository

Figure 6: tasks’s content

Figure 7: stages’s variables

Figure 8: pipeline’s variables

Exploitation

Finally, we exploit our data using some of Hitachi Vantara’s Pentaho solutions. Basic reports are built using Pentaho report designer (PRD), which is a so-called “What You See, What You Get” tool. These reports data is built using custom SQL queries as data sources for example.

Figure 9: PRD user interface

The reports can then be generated from Pentaho User Console (which manages users and reports) or scheduled on a fixed time basis and sent by email.

Figure 10: example of report

We also use Pentaho Community Dashboard Editor (CDE) to create dashboards. These dashboards can be accessed using Pentaho User Console or can be web integrated.

The last Pentaho solution we use is Mondrian. It helps us to create multidimensional cubes. Those cubes can thereafter act as data sources for CDE Dashboards or PRD Reports or Excel sheets for instance.

Conclusion

In conclusion, to build a free BI Solution Architecture, we used the following stack:

  • Open Studio for Data Integration
  • PostgreSQL Database
  • GIT Versioning
  • Go Continuous Delivery
  • Pentaho Report Designer
  • Pentaho Community Dashboard Editor

SHARE ON :


Related articles

November 4, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 2 

Welcome to the second part of our Dataminds Connect 2024 recap! After covering the first two days of this event in our initial article, we’re excited to share...

October 18, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

The Dataminds Connect 2024 event, held in the picturesque city of Mechelen, Belgium, is a highly anticipated three-day gathering for IT professionals and Micros...

March 25, 2024

Read in minutes

Getting started with the new Power BI Visual Calculations feature!

Power BI’s latest feature release, Visual Calculations, represents a paradigm shift in how users interact with data.      Rolled ...


Web service tuning with Talend ESB

January 28, 2019

IT

Read in 5 minutes

In this post, I’ll show you what I did to optimize a REST web service build with Talend ESB using Apache Camel and Apache CXF camel component. In this basic scenario, the web service read records in database using SQL queries (no JPA) and the response will be marshalled, pretty printed and rendered.

Introduction

For this technical demo, I’ll mount a H2 database for testing purpose and launch some performance test using Apache JMeter, the project source code and the JMeter testing project can be found on my github account.

I just want to see the throughput and the average response time then I’ll try to increase it by adding some extra functionalities (or new features), the aim is to improve throughput at its best and response time to the lower possible value.

So I’ll try these following clues, don’t hesitate to give me your own in the comment section 😉 this post still open for an update.

  • Changing the camel data format for mashaling/unmarshaling the JSON output, indeed, some framework are faster than other in this task, you check this here.
  • Adding new JDBC driver like HirakiCP, according to graphics, not just better than c3p0, Tomcat, vibur, and dbcp2 just amazing.
  • Using a cache management like Ehcache, memory or disk caching hits are faster than database hits.
  • Unfortunately multithreading is useless in this case, if you have an advice on this, send me a comment.

To be clear, the aim is mainly a developer side improvement, just a quick and dirty proof of concept, for example, to be more accurate it’s better to launch the JMeter test in no gui mode and use a production ready RGDB instead of H2, e.g. you can improve performance with specific parameters like MySQL can do with prepareStatement caching capability.

Test scenario with JMeter

As I previously said, I mounted a H2 database with only one table and 4 test records.

Here is my thread group corresponding to 10 users and 10 thousand calls for each of them.

I generate a random id value between 1 and 4.

And finally call the web service with the generated id value.

Creating the Camel route

Here is the Camel route, very simple, nothing more and nothing less.

img5_webservice_tuning_talend_esb
img6_webservice_tuning_talend_esb

Just change the bindingStyle to simple consumer to get the web service URL parameters values into the header.

img7_webservice_tuning_talend_esb

I use the message header to directly parse value into the SQL query you build into the body.

img8_webservice_tuning_talend_esb
img9_webservice_tuning_talend_esb

I finally marshal the SQL result with the Camel data format component.

img10_webservice_tuning_talend_esb

Changing the dataformat

I’ll try three of them, Jackson, XStream and a new challenger named fastjson from Alibaba group existing since the 2.20 of Camel version. Unfortunately I had to create the camel-fastjson-alldep as it’s not already packaged with the Talend 7.1.1 solution, sounds strange but anyway, you can use the Maven assembly plug-in with this dependency

<dependency>

<groupId>org.apache.camel</groupId>

<artifactId>camel-fastjson</artifactId>

<version>2.21.2</version>

</dependency>

Let’s see the result with JMeter

Database connection with Spring and Jackson data format

img11_webservice_tuning_talend_esb

Database connection with Spring and Xstream data format

img12_webservice_tuning_talend_esb

Database connection with Spring and Fastjson data format

img13_webservice_tuning_talend_esb

It seems that fastjson is the winner with 550 request by second with a gain of 20%, so I’ll keep it for the next improvement step.

The response time tends to 17ms, not bad, but let see if we can do better 😉

img14_webservice_tuning_talend_esb

Changing the driver

If you take a look at the HikariCP github page, you’ll see that HikariCP is the perfect challenger and thanks to this good new, there is no source code to change with the camel bean register, only the JDBC connection parameters.

The first attempt was done with the spring transaction manager, as you can see with the bean register component.

See the spring 4 library found on Maven repository, not provided with Talend version 7.

img15_webservice_tuning_talend_esb

The bean register looks like this for the spring connection, I used the SimpleDriverDataSource recommended by red hat if you plan to deploy this route on an OSGI environment.

img16_webservice_tuning_talend_esb

Just add the 143ko library jar you can find here into the cConfig

img17_webservice_tuning_talend_esb

… and change the Camel bean register with the HikariCP parameters, the bean id doesn’t even change 😉

img18_webservice_tuning_talend_esb

Let’s see the result with JMeter

img19_webservice_tuning_talend_esb

Simply by changing the JDBC connection pool, we gain 80% with a throughput score of 995 hits/s

The average response time drops to 9ms, better.

img20_webservice_tuning_talend_esb

Adding a cache manager

Now the route is a little bit more complicated as we have to put and get value from a caching system named Ehcache, apache Camel is able to use it with its dedicated Ehcache component.

For my test, I create a cache on disk located on your user temporary space and a one second time to live, I wanted to simulate a cache expiry in order to make others calls onto the database instead of four calls corresponding to my four poor records.

Changing the Camel route

img21_webservice_tuning_talend_esb

I use a cDirect, the Ehcache endpoint is always the same

img22_webservice_tuning_talend_esb

Add a resource xml file, this file store all the Ehcache parameters:

img23_webservice_tuning_talend_esb

Here is the content file

<?xml version=“1.0” encoding=“UTF-8”?>
<config xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance&#8221;xsi:schemaLocation=http://www.ehcache.org/v3 http://www.ehcache.org/schema/ehcache-core-3.5.xsd&#8221;>
<persistence directory=“${java.io.tmpdir}” />
<cache alias=“dataServiceCache”>
<key-type>java.lang.String</key-type>
<value-type>java.lang.String</value-type>
<expiry>
<ttl unit=“seconds”>1</ttl>
</expiry>
<resources>
<heap unit=“entries”>100</heap>
<disk persistent=“true” unit=“MB”>10</disk>
</resources>
</cache>
</config>

The exchange message header contains keywords you have to customize like CamelEhcacheActionCamelEhcacheKey and CamelEhcacheValue to make the action and store the key/value, for more informations see the documentation page.

img24_webservice_tuning_talend_esb
img25_webservice_tuning_talend_esb

Tips: Define the key value as a String type, more efficient than Object type by default, the throughput will be better.

img26_webservice_tuning_talend_esb

Let’s see the result with JMeter

img27_webservice_tuning_talend_esb

Simply by adding a cache manager, we gain another 50% (400% from first try) with a score of 1972 hits/s

The average response time drops again to 4ms !!!

img28_webservice_tuning_talend_esb

Conclusion

Changing the JDBC driver increase the throughput by 80% and adding a cache manager give me a 400% gain with only 336 hits on the database. Web services are very challenging, as I’m not the ultimate expert, I think again when I start this kind of development. My approach was to merge the best of the examples I could find on the Apache Camel official repository by taking all the best practice. Leave a comment to share your feeling and thanks for reading.

SHARE ON :


Related articles

January 21, 2019

Read in 1 minutes

Docker image for Talend MDM web UI

Installing manually any new web application is still a mess, thanks to Docker we’ll see how to gain leverage in the Talend web user interface installation. I&...


Docker image for Talend MDM web UI

January 21, 2019

IT

Read in 1 minutes

Installing manually any new web application is still a mess, thanks to Docker we’ll see how to gain leverage in the Talend web user interface installation. I’ll describe all the steps to up the web UI in only a few minutes.

Introduction

Fortunately Talend MDM web UI is packaged with an executable jar which can be drive by a response file in an unattended mode. The file is just a script in xml format read by the installer during the installation process, see Talend documentation for further information about the silent mode.

Even if Talend recommends these compatible OS and official Oracle JVM, I used Alpine base image and open source JVM in order to get the lightest image as possible (239Mo) and it works like a charm 😉

img1_akabi_docker_talend_mdm_webui

Installation

The docker image is hosted here, for information, the latest docker version corresponds to the last Talend stable version (7.1.1 released the 26th of October 2018)

Launch docker daemon and use this command to pull the image

Finally, use this other command to run the container

Open your favourite browser and go to the Talend MDM welcome page, the webUI is now ready to receive all Talend MDM objects you’ll push with the open studio client, see my previous post to see how to do.

img2_akabi_docker_talend_mdm_webui.jpg

Hope that helps and thanks for reading 😉

SHARE ON :


Related articles

January 28, 2019

Read in 5 minutes

Web service tuning with Talend ESB

In this post, I’ll show you what I did to optimize a REST web service build with Talend ESB using Apache Camel and Apache CXF camel compon...