AKABI’s consultants share insights from Dataminds Connect 2024: part 2

November 4, 2024

AI Analytics Business Inteligence Data Integration Event Microsoft Azure + 2

Read in minutes

Welcome to the second part of our Dataminds Connect 2024 recap! After covering the first two days of this event in our initial article, we’re excited to share our feedback from the final day of the conference. This concluding day proved especially valuable, with in-depth sessions on Microsoft Fabric, Power BI, and Azure cloud solutions, providing practical perspectives for our ongoing and future projects. Join us as we explore the key highlights, lessons learned, and impactful discussions from the last Dataminds Connect.

The Power of Paginated Reports – Nico Jacobs

As we all know, paginated reports are the evolution of a very old technology: SSRS (for SQL Server Reporting Services). But that doesn’t mean it should be considered legacy! This option still has a lot to offer, and Mr. Jacobs illustrates this beautifully with five fundamental advantages such as export options, component nesting, source flexibility, etc.

Disaster Recovery Strategies for SQL Server – Andrew Pruski

“A pessimist is an optimist with experience”, “Hope is not a strategy” (by Google SRE Team Motto), “Business don’t care about SQL Server or Oracle, They care about data” – these are just a few of the key phrases that raise awareness of the importance of a contingency plan in the event of a technical problem. Solutions and safeguards are then proposed to prevent the main bad practices. The most important thing to remember is that you shouldn’t worry about whether your database is backed up, but about how and how quickly the backup can be restored and made operational.

The Renaissance of Microsoft Pureview – Tom Bouten

“If DATA is the new OIL, then METADATA will make you RICH” is the tagline for any data lineage tool. This is how Mr. Bouten introduces the Pureview tool. The tool wasn’t great when it first came out, but it’s getting better all the time. It’s worth keeping an eye on it because it’s automating more and more processes and discoveries. It’ll be used in more and more functions within a company. Thanks for the presentation and the refresher.

Start 2 MLOps: From the lab to production – Nick Verhelst

In this MLOps session, we have explored the machine learning lifecycle process, emphasizing essential aspects like clear problem definitions, stakeholder alignment, and the importance of monitoring and quality assurance. These are foundational to ensuring successful outcomes in machine learning projects.

Also, we have discussed around the double diamond design process illuminated its role in business and data understanding, showing how alternating between problem definition and solution exploration helps guide the ML lifecycle

The session gave me a comprehensive overview of the ML project lifecycle, stressing the importance of structure, collaboration, and the right tools. By balancing creative exploration with robust coding practices and incorporating monitoring tools

With great power comes a great bill: optimize, govern and monitor your Azure costs – Kristijan Shirgoski

“It is never too late to start”, In this session we have discussed several tips, recommendations and how bill works for resources that are very commonly used such as Data Factory, Databricks, SQL Databases, Synapse, Fabric, Log Analytics, Data Lake, Virtual Machines, etc.

So, we have learned the newest best practices to save costs in our cloud infrastructure discussing subjects like azure policies, DBU (Databricks Unit), DSU (Databricks Storage Unit), tags, scale up on demand, share compute, auto termination, spot instances, reservations, quotas, infrastructure as code to optimize and monitor our azure costs.

“Today is the first day until the rest of your life”, from this session I keep in mind the relevance of monitor our resources and activity in Cloud to improve performance and save costs through good practices

Optimize your Azure Data & AI platform for peak performance: rethink security and stability – Wout Cardoen

In the session I learned that modularity is crucial for staying ahead of the competition. This involves ensuring that specific data is handled appropriately, building a future-oriented data platform, and accelerating development processes.

Security was highlighted with the principle “trust is good; control is better”. Key elements include managing identity and data access with a least-privileged approach, integrating secret management with Azure Key Vault, implementing network security through total lockdown, and adopting the four-eyes principle in DevOps security. Data quality was emphasized through the application of metadata constraints

Finally, I was reminded to maintain order and cleanliness on the platform. Avoid temporary solutions or remove them promptly and ensure proper documentation. The importance of not overengineering the platform with unnecessary functionalities was also stressed, promoting efficiency and focusing on essential features.

Power BI refreshes – reinvented! – Marc Lelijveld

This session explored the various refresh options available in Power BI, highlighting their advantages and the contexts in which they are best utilized. We examined different storage modes—Import, Direct Query, and Dual Mode—demonstrating how they can be combined in a composite model. We also discussed the importance of incremental refresh, including when and how to implement it effectively. Finally, we covered how to connect Power BI refreshes to other processes for centralized orchestration. Overall, this session provided valuable insights into optimizing data refresh strategies in Power BI.

What Fabric means for your Power BI semantic models – Kurt Buhler

I was thoroughly impressed by the session delivered by Kurt. His presentations always stand out with incredibly well-designed slides that have a unique and captivating visual style. The various scenarios he presented were especially interesting, as they allowed us to grasp each concept in-depth and explore possible solutions.

Kurt explained how Microsoft Fabric introduces new features that will transform the way we build and use Power BI semantic models. He highlighted the importance of understanding these features and knowing how and when to apply them effectively. The session covered what a Power BI semantic model is, why it’s essential in Fabric, and explored three scenarios showing how teams are leveraging these features to address current Power BI challenges.

In this talk, Kurt assumed a foundational understanding of features like Direct Lake storage mode, semantic link in notebooks, and Git integration. He focused more on the ‘how’ and “why” of these tools, which added a layer of strategic thinking beyond just knowing what they do.

By the end of the session, I had a much clearer understanding of how I might approach these new features for the semantic models. It was an incredibly valuable and engaging presentation!

The sidekick’s guide to supercharging Power BI Heroes – Paulina Jędrzejewska

I really loved the presentation given by Paulina. She started by sharing her professional background and explained how her first mission at a client allowed her to quickly find a way to make a difference using Power BI. This set the stage for what was to come—a highly engaging and technical demo.

The demo focused on Tabular Editor, showcasing the power of C# scripting and Calculation Groups, which was incredibly insightful. The idea was to demonstrate how Tabular Editor can save significant time in creating generic measures, adding descriptions, and more. Paulina walked us through how to automate and optimize processes, streamlining the development of efficient data models.

In conclusion

To wrap up, our experience at the seminar was truly enriching across all sessions. The diversity of topics and expertise has left us well-equipped with new ideas and strategies to apply in our work. A special thanks to all the organizers and speakers for making this event so impactful. The lessons learned will play a crucial role in driving our continued success. We look forward to attending future editions and further contributing to the growing knowledge within our industry!

See you next time!

Authors: Alexe Deverdenne, Sophie Opsommer, Hugo Henris, Martin Izquierdo, Pierre-Yves Richer, Thibaut De Carvalho

SHARE ON :

October 18, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

The Dataminds Connect 2024 event, held in the picturesque city of Mechelen, Belgium, is a highly anticipated three-day gathering for IT professionals and Micros...

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...

May 28, 2024

Read in minutes

Enhancing Real-Time Data Processing with Databricks: Apache Kafka vs. Apache Pulsar

In the era of big data, real-time data processing is essential for organizations seeking immediate insights and the ability to respond swiftly to changing marke...

back to all articles

comments

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

October 18, 2024

AI Analytics Business Inteligence Data Integration Microsoft Azure + 1

Read in minutes

The Dataminds Connect 2024 event, held in the picturesque city of Mechelen, Belgium, is a highly anticipated three-day gathering for IT professionals and Microsoft data platform enthusiasts. This year, the focus was on innovative technologies, including Microsoft Fabric, Power BI, and Azure cloud solutions. The event provided an invaluable opportunity for our consultants to gain insights from leading experts in the field and stay abreast of the latest advancements in data management. In this two-part series, we will be sharing the key takeaways and experiences of the event. This first part will cover feedback from the first two days of the seminar, highlighting some of the most impactful sessions and insights.

Further insights and experiences will be shared in the second part of this series, which will cover the feedback from the third and final day of the event. This day was particularly valuable, offering even more lessons and cutting-edge discussions.

Upgrading Entreprise Power BI Architecture – Steve Campbell & Mathias Thierbach

During this training, we gained insight into the internal workings of Power BI, enabling us to optimize our models more effectively. The instructor explained how Power BI compresses data using techniques like run-length encoding and dictionaries (hash encoding for text fields and value encoding for numeric fields). By understanding these mechanisms, we learned how to structure our models to maximize compression efficiency, especially by managing column cardinality. For instance, limiting high-cardinality columns, favoring integer formats, and disabling attribute hierarchies are key steps in optimizing dataset performance.

Build a Microsoft Fabric Proof of Concept in a Day – Cathrine Wilhelmsen, Emilie Rønning & Marthe Moengen

I recently had the opportunity to attend the “Build a Microsoft Fabric Proof of Concept in a Day” seminar, hosted by Cathrine, Emilie, and Marthe. It was an extremely engaging experience. The three presenters contributed a wealth of knowledge from their distinct professional backgrounds, which greatly enhanced the training. It was particularly beneficial to gain insights from individuals occupying pivotal roles within the Fabric ecosystem. This approach enabled us to engage in critical analysis of key aspects such as Fabric’s technical architecture, data modeling, and data architecture design.

Saying no is OK – Sander Star

While declining is not an easy task, it is a crucial skill in professional settings. It is not merely a necessity; it is, in fact, a skill. Knowing how to decline offers protection from conditions such as depression, overwork, and reduced productivity. It also means maintaining a healthy work-life balance, establishing clear limits and boundaries, and maintaining consistency in terms of quality, effectiveness, and efficiency. This highly practical training course is suitable for all audiences and provides participants with the opportunity to experience a variety of situations. It offers detailed explanations of these situations and provides guidance on how to implement them effectively.

Become a metadata driven DBA – Magnus Ahlkvist

Are you a DBA whose day-to-day work is full of repetitive tasks, monitoring and running scripts in different places? Then this course would have been made for you. His slogan : ‘Automation is about turning something boring and repetitive into something more fun’ and to achieve this, Mr Ahlkvist suggested to combine DBA Tool and DBA Checks with an overlay of Pode (which creates REST APIs in powershell).

SQL Server Infernals – A Beginner’s Guide to SQL Server Worst Practices – Gianluca Sartori

With no pre-requisites, this course provides a comprehensive overview of everything you need to avoid with databases. For young and old alike, it’s often a good idea to go back to basics, to remember the little things that have a big impact.

Fabric: adoption roadmap: Napoleon’s success story – Jo David

In a departure from the typical technological focus of our industry, Jo David invited us to immerse ourselves in the history of 18th and 19th century France through the story of Napoleon Bonaparte. Mr. David demonstrated that adopting a fabric in a company is a relatively straightforward process, comparable to the challenges of waging a war. Once the key elements of success have been identified, it becomes easier to prepare for change.

What’s wrong with the Medallion Architecture? – Simon Whiteley

Behind this big title, Simon Whiteley tackled a genuine issue that affects companies when layering their Lakehouses. The “medallion architecture” approach may not be the optimal solution for complex real-life data structures, and the distinction between layers may not be readily apparent to non-data collaborators. By presenting the broad stages of data curation in a step-by-step manner and emphasizing the importance of proper naming, Whiteley provided a more grounded approach to Lakehouse design that more closely aligns with the reality of data.

The Sixth Sense: Using Log Analytics to Efficiently Monitor Your Azure Environment – Abhinav Jayanty

In this presentation, Jayanty outlined the general steps for developing the monitoring component of an Azure environment. He began by presenting the process for monitoring activity logs of Azure objects, querying resources using KQL (Kusto Query Language), and determining pricing options based on data retention requirements. The latter part provides visual examples of KQL queries on Azure objects to extract metrics, log onto SQL tables, and implement message-based alerting. Given the extensive range of analytical tools available in Azure Monitor, it was not feasible for Jayanty to cover each one in detail. However, he provided a comprehensive overview of the monitoring tool and its integration within the Azure platform, which left attendees with a solid grasp of the subject matter.

EFSA implements a data mesh at scale with Databricks practical insights – Sebastiaan Leysen, Giancarlo Costa and Jan Van Meirvenne

In 2019, Zhamak Dehghani proposed the data mesh architecture, which suggests organizing domain-based teams (business and technical profiles) around a central data team with expertise in data ingestion. To more effectively accommodate the expansion in the number of sources, teams, and data, as well as the demand for greater business autonomy, bespoke data transformation and scheduling, the EFSA (European Food Safety Authority) teams have transitioned to a data mesh architecture for their data organisation. They outlined how data is shared with different teams using the new share functionality on Databricks, how teams are organized by domain, and the need for a data governance team that oversees security, access, and monitoring.

Effective Data Quality Checks and Monitoring with Databricks and Pandas – Esa Denaux

Quality is defined as meeting a predefined standard and prioritizing both correctness and transparency. In the session on data quality using Pandas and Databricks, I explored strategies to ensure high data quality throughout the data lifecycle, using De Lijn’s reference architecture and technology stack as an example. During the session with Esa, we discussed the use of visualization techniques like histograms, box plots, and scatter plots for detecting anomalies. We also considered summary statistics and data quality reports as tools for gaining deeper insight into data quality. This session has provided me with a comprehensive approach to data quality management, from the initial profiling and validation of data sets to the deployment of automated testing and monitoring systems. By focusing on both technical validation (through Pandas and Databricks) and strategic practices (like naming conventions and business rule enforcement), organizations can ensure that their data remains a valuable and reliable asset.

Exploring the art of Conditional Fromatting in Power BI – Anastasia Salari

I really appreciated the session given by Anastasia Salari. As an introduction, Anastasia explained the importance of conditional formatting through interactive examples that could easily be part of a business presentation to raise awareness on the use of appropriate visuals. She introduced effective techniques and uncovered the strategic value behind them, enhancing our understanding of both the ‘how’ and the ‘why.’ We learned how to use this simple yet powerful feature to streamline complex information and make reports not only visually appealing but also fundamentally more effective. Afterwards, a very interesting and detailed demo was given, showcasing a significant number of Power BI visuals featuring visual formatting. Anastasia demonstrated each time how she had implemented it, which gave us ideas for possible applications at the client’s site. The session provided immediate insights into how conditional formatting can improve how reports communicate data and elevate the overall impact of data visualization.

Optimizing Power BI Development: Unleashing the Potential of Developer Mode – Rui Romano

This session provided an insightful look into Developer Mode in Power BI, focusing on how it integrates developer-centric features such as source control and Azure DevOps. The presenter demonstrated how these tools enable better team collaboration and the creation of CI/CD pipelines, enhancing the scalability and reliability of Power BI projects. It was a very interesting presentation that highlighted powerful new features in Power BI, some of which are already partially available and will likely transform how we work with Power BI in the future.

In conclusion

This first part of our seminar feedback highlights just a glimpse of the rich knowledge and experiences we gained over this outstanding event. The insights shared were invaluable and have provided us with new perspectives on several key topics. Stay tuned for the second part, where we will continue to explore more of the seminars and share additional takeaways that will certainly fuel our future growth!

Authors: Alexe Deverdenne, Hugo Henris, Martin Izquierdo, Pierre-Yves Richer, Sophie Opsommer, Thibaut De Carvalho

SHARE ON :

November 4, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 2

Welcome to the second part of our Dataminds Connect 2024 recap! After covering the first two days of this event in our initial article, we’re excited to share...

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...

May 28, 2024

Read in minutes

Enhancing Real-Time Data Processing with Databricks: Apache Kafka vs. Apache Pulsar

In the era of big data, real-time data processing is essential for organizations seeking immediate insights and the ability to respond swiftly to changing marke...

back to all articles

comments

L’IA générative et les LLMs pour une information accessible et des processus optimisés

November 28, 2023

AI Event

Read in 10 minutes

Le mois dernier, Medhi Famibelle, Pascal Nguyen et moi avons assisté dans les locaux du Wagon (entreprise proposant des formations dans la data) à trois talks organisés dans le cadre d’un meet-up du groupe Generative AI Paris. Nous avons pu constater sans surprise la prévalence de l’IA et en particulier des technologies relevant des LLMs dans des secteurs très différents : elles permettent des optimisations et un gain de temps significatif lorsque maitrisée. Retrouvez l’intégralité des présentations ici :

Meetup “Generative AI Paris” – 31 Octobre 2023 – YouTube

Petit tour d’horizon des talks.

Utilisation et optimisation de la méthode RAG 🤖

Le Retrievial Augmented Generation (RAG) est devenu la technique phare en NLP pour construire des systèmes de Question & Answering permettant d’interroger en langage naturel des données de formats et sources divers. Chez Sicara, le RAG a été implémenté via un chatbot Slack permettant de répondre à des questions sur l’entreprise. Le RAG passe par le chunking des documents afin de les vectoriser et les disposer dans une base de données pour pouvoir évaluer la similarité avec une question posée.

Quelle différence entre un POC et un programme en prod ? Pour un POC, utiliser un framework tel que Langchain pour manipuler le LLM est une bonne idée. Il faut ensuite choisir la base de données : vectorielle ou non. Il nous recommande l’utilisation de bases de données non vectorielles telles que Postgres/Elasticsearch lorsque le nombre de vecteurs attendus est sous le million. Dans le cas inverse, il existe des bases vectorielles dédiées telles que ChromaDB ou Qdrant.

Rien ne vaut le contrôle sur le modèle afin notamment de pouvoir affiner ses prédictions en analysant les probabilités en sortie. C’est un avantage des LLM open sources selon l’intervenant. Toutefois, en fonction du volume de la base de connaissances, une solution payante passant par exemple par GPT peut être plus économe et efficace. Pour passer de POC à production, réfléchir à la mise à jour des vecteurs de la base, en cas d’ajouts ou de modifications des documents, est très important. Cela peut être fait via des workflows avec, par exemple Airflow. Collecter et analyser les entrées des utilisateurs permet aussi de savoir si l’outil est bien utilisé, de s’assurer que les utilisateurs ne sont pas démunis face à lui. Utiliser DVC peut être utile pour expérimenter avec différents modèles. Vous l’avez compris : tester, monitorer pour améliorer les résultats du RAG est la bonne démarche.

L’IA générative au service des jeux vidéo 🎮

Vous connaissez peut-être l’univers des jeux mobiles. Chez Popscreen, le développement de jeux vidéo a été considérablement accéléré grâce à l’IA générative pour faire du contrôle créatif : générer des images et du texte.

La génération des images passe par SD1.5, Stable Diffusion et des modèles Lora. Ils utilisent aussi ControlNet pour générer des images à partir des dessins de leurs artistes : en s’appuyant sur une image de référence (utilisée pour la texture), un personnage (dessiné par leurs artistes), ils sont capables de générer différentes unités générées en quelques jours grâce à Stable Diffusion. À partir d’une vingtaine d’illustrations faites par leurs artistes, Popscreen peut obtenir un modèle lora qui, couplé à SD1.5, leur permet de créer de toutes nouvelles unités à partir de prompt.

Côté génération de texte, on retrouve GPT et Langchain. Ces outils permettent à l’entreprise de générer différents éléments textuels : dialogues, descriptions des classes de personnages, etc. Grâce à l’IA générative, l’entreprise estime réaliser en quelques semaines des contenus qui leur prendraient plusieurs mois à être faits de façon traditionnelle.

L’IA générative au service de la pédagogie 📚

Le dernier speaker de Didask, nous montre comment les LLM ont permis à son entreprise de création d’e-learning d’économiser 12 000 jours de travail. Ils se sont appuyés sur la connaissance métier d’experts en sciences cognitives et de l’éducation pour savoir comment structurer l’information afin d’avoir une approche « learner first » de l’apprentissage pour les apprenants d’un module d’e-learning.

Cela passe par l’identification de l’enjeu cognitif principal des notions que l’e-learning doit transmettre à l’apprenant. Déconstruire les schémas erronés ? Mise en situation de l’apprenant. Créer des traces mentales pour mémoriser de nombreuses informations ? Utilisation de flashcards.

L’IA pédagogique sélectionne le format approprié pour le contenu qui doit être transmis en fonction de l’enjeu cognitif, génère le contenu puis transforme le contenu en une expérience interactive. Tout ceci est fait à partir de documents non structurés en entrée de l’IA pédagogique. Cette IA fonctionne notamment grâce au LLM et notamment le RAG afin de décider des objectifs pédagogiques, du contenu par format (flashcards, mise en situation, etc.). Tout ceci est rendu possible grâce à un prompt engineering adéquat, s’appuyant sur l’expertise des experts en sciences cognitives et de l’éducation, que le LLM utilise en arrière-plan. 🧠

Nous constatons que l’intelligence artificielle générative « autrefois » connue uniquement pour la génération d’images connait une progression fulgurante en traitement automatique du langage et est de plus en plus utilisée avec des résultats plus que prometteurs. Heureusement, chez AKABI, nous restons à l’affut des progrès dans ce domaine pour pouvoir répondre aux enjeux business et aux nouveaux use cases naissant chaque jour. 🚀

Nicolas Baouaya, IA & Data Science Consultant

SHARE ON :

November 4, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 2

Welcome to the second part of our Dataminds Connect 2024 recap! After covering the first two days of this event in our initial article, we’re excited to share...

October 18, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

The Dataminds Connect 2024 event, held in the picturesque city of Mechelen, Belgium, is a highly anticipated three-day gathering for IT professionals and Micros...

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...

back to all articles

Category: AI

AKABI’s consultants share insights from Dataminds Connect 2024: part 2

Related articles

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

Insights from the Gartner Data & Analytics Summit in London

Enhancing Real-Time Data Processing with Databricks: Apache Kafka vs. Apache Pulsar

comments

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

Related articles

AKABI’s consultants share insights from Dataminds Connect 2024: part 2

Insights from the Gartner Data & Analytics Summit in London

Enhancing Real-Time Data Processing with Databricks: Apache Kafka vs. Apache Pulsar

comments

L’IA générative et les LLMs pour une information accessible et des processus optimisés

Related articles

AKABI’s consultants share insights from Dataminds Connect 2024: part 2

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

Insights from the Gartner Data & Analytics Summit in London