AKABI’s consultants share insights from Dataminds Connect 2024: part 2

November 4, 2024

AI Analytics Business Inteligence Data Integration Event Microsoft Azure + 2

Read in minutes

Welcome to the second part of our Dataminds Connect 2024 recap! After covering the first two days of this event in our initial article, we’re excited to share our feedback from the final day of the conference. This concluding day proved especially valuable, with in-depth sessions on Microsoft Fabric, Power BI, and Azure cloud solutions, providing practical perspectives for our ongoing and future projects. Join us as we explore the key highlights, lessons learned, and impactful discussions from the last Dataminds Connect.

The Power of Paginated Reports – Nico Jacobs

As we all know, paginated reports are the evolution of a very old technology: SSRS (for SQL Server Reporting Services). But that doesn’t mean it should be considered legacy! This option still has a lot to offer, and Mr. Jacobs illustrates this beautifully with five fundamental advantages such as export options, component nesting, source flexibility, etc.

Disaster Recovery Strategies for SQL Server – Andrew Pruski

“A pessimist is an optimist with experience”, “Hope is not a strategy” (by Google SRE Team Motto), “Business don’t care about SQL Server or Oracle, They care about data” – these are just a few of the key phrases that raise awareness of the importance of a contingency plan in the event of a technical problem. Solutions and safeguards are then proposed to prevent the main bad practices. The most important thing to remember is that you shouldn’t worry about whether your database is backed up, but about how and how quickly the backup can be restored and made operational.

The Renaissance of Microsoft Pureview – Tom Bouten

“If DATA is the new OIL, then METADATA will make you RICH” is the tagline for any data lineage tool. This is how Mr. Bouten introduces the Pureview tool. The tool wasn’t great when it first came out, but it’s getting better all the time. It’s worth keeping an eye on it because it’s automating more and more processes and discoveries. It’ll be used in more and more functions within a company. Thanks for the presentation and the refresher.

Start 2 MLOps: From the lab to production – Nick Verhelst

In this MLOps session, we have explored the machine learning lifecycle process, emphasizing essential aspects like clear problem definitions, stakeholder alignment, and the importance of monitoring and quality assurance. These are foundational to ensuring successful outcomes in machine learning projects.

Also, we have discussed around the double diamond design process illuminated its role in business and data understanding, showing how alternating between problem definition and solution exploration helps guide the ML lifecycle

The session gave me a comprehensive overview of the ML project lifecycle, stressing the importance of structure, collaboration, and the right tools. By balancing creative exploration with robust coding practices and incorporating monitoring tools

With great power comes a great bill: optimize, govern and monitor your Azure costs – Kristijan Shirgoski

“It is never too late to start”, In this session we have discussed several tips, recommendations and how bill works for resources that are very commonly used such as Data Factory, Databricks, SQL Databases, Synapse, Fabric, Log Analytics, Data Lake, Virtual Machines, etc.

So, we have learned the newest best practices to save costs in our cloud infrastructure discussing subjects like azure policies, DBU (Databricks Unit), DSU (Databricks Storage Unit), tags, scale up on demand, share compute, auto termination, spot instances, reservations, quotas, infrastructure as code to optimize and monitor our azure costs.

“Today is the first day until the rest of your life”, from this session I keep in mind the relevance of monitor our resources and activity in Cloud to improve performance and save costs through good practices

Optimize your Azure Data & AI platform for peak performance: rethink security and stability – Wout Cardoen

In the session I learned that modularity is crucial for staying ahead of the competition. This involves ensuring that specific data is handled appropriately, building a future-oriented data platform, and accelerating development processes.

Security was highlighted with the principle “trust is good; control is better”. Key elements include managing identity and data access with a least-privileged approach, integrating secret management with Azure Key Vault, implementing network security through total lockdown, and adopting the four-eyes principle in DevOps security. Data quality was emphasized through the application of metadata constraints

Finally, I was reminded to maintain order and cleanliness on the platform. Avoid temporary solutions or remove them promptly and ensure proper documentation. The importance of not overengineering the platform with unnecessary functionalities was also stressed, promoting efficiency and focusing on essential features.

Power BI refreshes – reinvented! – Marc Lelijveld

This session explored the various refresh options available in Power BI, highlighting their advantages and the contexts in which they are best utilized. We examined different storage modes—Import, Direct Query, and Dual Mode—demonstrating how they can be combined in a composite model. We also discussed the importance of incremental refresh, including when and how to implement it effectively. Finally, we covered how to connect Power BI refreshes to other processes for centralized orchestration. Overall, this session provided valuable insights into optimizing data refresh strategies in Power BI.

What Fabric means for your Power BI semantic models – Kurt Buhler

I was thoroughly impressed by the session delivered by Kurt. His presentations always stand out with incredibly well-designed slides that have a unique and captivating visual style. The various scenarios he presented were especially interesting, as they allowed us to grasp each concept in-depth and explore possible solutions.

Kurt explained how Microsoft Fabric introduces new features that will transform the way we build and use Power BI semantic models. He highlighted the importance of understanding these features and knowing how and when to apply them effectively. The session covered what a Power BI semantic model is, why it’s essential in Fabric, and explored three scenarios showing how teams are leveraging these features to address current Power BI challenges.

In this talk, Kurt assumed a foundational understanding of features like Direct Lake storage mode, semantic link in notebooks, and Git integration. He focused more on the ‘how’ and “why” of these tools, which added a layer of strategic thinking beyond just knowing what they do.

By the end of the session, I had a much clearer understanding of how I might approach these new features for the semantic models. It was an incredibly valuable and engaging presentation!

The sidekick’s guide to supercharging Power BI Heroes – Paulina Jędrzejewska

I really loved the presentation given by Paulina. She started by sharing her professional background and explained how her first mission at a client allowed her to quickly find a way to make a difference using Power BI. This set the stage for what was to come—a highly engaging and technical demo.

The demo focused on Tabular Editor, showcasing the power of C# scripting and Calculation Groups, which was incredibly insightful. The idea was to demonstrate how Tabular Editor can save significant time in creating generic measures, adding descriptions, and more. Paulina walked us through how to automate and optimize processes, streamlining the development of efficient data models.

In conclusion

To wrap up, our experience at the seminar was truly enriching across all sessions. The diversity of topics and expertise has left us well-equipped with new ideas and strategies to apply in our work. A special thanks to all the organizers and speakers for making this event so impactful. The lessons learned will play a crucial role in driving our continued success. We look forward to attending future editions and further contributing to the growing knowledge within our industry!

See you next time!

Authors: Alexe Deverdenne, Sophie Opsommer, Hugo Henris, Martin Izquierdo, Pierre-Yves Richer, Thibaut De Carvalho

SHARE ON :

October 18, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

The Dataminds Connect 2024 event, held in the picturesque city of Mechelen, Belgium, is a highly anticipated three-day gathering for IT professionals and Micros...

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...

May 28, 2024

Read in minutes

Enhancing Real-Time Data Processing with Databricks: Apache Kafka vs. Apache Pulsar

In the era of big data, real-time data processing is essential for organizations seeking immediate insights and the ability to respond swiftly to changing marke...

back to all articles

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

October 18, 2024

AI Analytics Business Inteligence Data Integration Microsoft Azure + 1

Read in minutes

The Dataminds Connect 2024 event, held in the picturesque city of Mechelen, Belgium, is a highly anticipated three-day gathering for IT professionals and Microsoft data platform enthusiasts. This year, the focus was on innovative technologies, including Microsoft Fabric, Power BI, and Azure cloud solutions. The event provided an invaluable opportunity for our consultants to gain insights from leading experts in the field and stay abreast of the latest advancements in data management. In this two-part series, we will be sharing the key takeaways and experiences of the event. This first part will cover feedback from the first two days of the seminar, highlighting some of the most impactful sessions and insights.

Further insights and experiences will be shared in the second part of this series, which will cover the feedback from the third and final day of the event. This day was particularly valuable, offering even more lessons and cutting-edge discussions.

Upgrading Entreprise Power BI Architecture – Steve Campbell & Mathias Thierbach

During this training, we gained insight into the internal workings of Power BI, enabling us to optimize our models more effectively. The instructor explained how Power BI compresses data using techniques like run-length encoding and dictionaries (hash encoding for text fields and value encoding for numeric fields). By understanding these mechanisms, we learned how to structure our models to maximize compression efficiency, especially by managing column cardinality. For instance, limiting high-cardinality columns, favoring integer formats, and disabling attribute hierarchies are key steps in optimizing dataset performance.

Build a Microsoft Fabric Proof of Concept in a Day – Cathrine Wilhelmsen, Emilie Rønning & Marthe Moengen

I recently had the opportunity to attend the “Build a Microsoft Fabric Proof of Concept in a Day” seminar, hosted by Cathrine, Emilie, and Marthe. It was an extremely engaging experience. The three presenters contributed a wealth of knowledge from their distinct professional backgrounds, which greatly enhanced the training. It was particularly beneficial to gain insights from individuals occupying pivotal roles within the Fabric ecosystem. This approach enabled us to engage in critical analysis of key aspects such as Fabric’s technical architecture, data modeling, and data architecture design.

Saying no is OK – Sander Star

While declining is not an easy task, it is a crucial skill in professional settings. It is not merely a necessity; it is, in fact, a skill. Knowing how to decline offers protection from conditions such as depression, overwork, and reduced productivity. It also means maintaining a healthy work-life balance, establishing clear limits and boundaries, and maintaining consistency in terms of quality, effectiveness, and efficiency. This highly practical training course is suitable for all audiences and provides participants with the opportunity to experience a variety of situations. It offers detailed explanations of these situations and provides guidance on how to implement them effectively.

Become a metadata driven DBA – Magnus Ahlkvist

Are you a DBA whose day-to-day work is full of repetitive tasks, monitoring and running scripts in different places? Then this course would have been made for you. His slogan : ‘Automation is about turning something boring and repetitive into something more fun’ and to achieve this, Mr Ahlkvist suggested to combine DBA Tool and DBA Checks with an overlay of Pode (which creates REST APIs in powershell).

SQL Server Infernals – A Beginner’s Guide to SQL Server Worst Practices – Gianluca Sartori

With no pre-requisites, this course provides a comprehensive overview of everything you need to avoid with databases. For young and old alike, it’s often a good idea to go back to basics, to remember the little things that have a big impact.

Fabric: adoption roadmap: Napoleon’s success story – Jo David

In a departure from the typical technological focus of our industry, Jo David invited us to immerse ourselves in the history of 18th and 19th century France through the story of Napoleon Bonaparte. Mr. David demonstrated that adopting a fabric in a company is a relatively straightforward process, comparable to the challenges of waging a war. Once the key elements of success have been identified, it becomes easier to prepare for change.

What’s wrong with the Medallion Architecture? – Simon Whiteley

Behind this big title, Simon Whiteley tackled a genuine issue that affects companies when layering their Lakehouses. The “medallion architecture” approach may not be the optimal solution for complex real-life data structures, and the distinction between layers may not be readily apparent to non-data collaborators. By presenting the broad stages of data curation in a step-by-step manner and emphasizing the importance of proper naming, Whiteley provided a more grounded approach to Lakehouse design that more closely aligns with the reality of data.

The Sixth Sense: Using Log Analytics to Efficiently Monitor Your Azure Environment – Abhinav Jayanty

In this presentation, Jayanty outlined the general steps for developing the monitoring component of an Azure environment. He began by presenting the process for monitoring activity logs of Azure objects, querying resources using KQL (Kusto Query Language), and determining pricing options based on data retention requirements. The latter part provides visual examples of KQL queries on Azure objects to extract metrics, log onto SQL tables, and implement message-based alerting. Given the extensive range of analytical tools available in Azure Monitor, it was not feasible for Jayanty to cover each one in detail. However, he provided a comprehensive overview of the monitoring tool and its integration within the Azure platform, which left attendees with a solid grasp of the subject matter.

EFSA implements a data mesh at scale with Databricks practical insights – Sebastiaan Leysen, Giancarlo Costa and Jan Van Meirvenne

In 2019, Zhamak Dehghani proposed the data mesh architecture, which suggests organizing domain-based teams (business and technical profiles) around a central data team with expertise in data ingestion. To more effectively accommodate the expansion in the number of sources, teams, and data, as well as the demand for greater business autonomy, bespoke data transformation and scheduling, the EFSA (European Food Safety Authority) teams have transitioned to a data mesh architecture for their data organisation. They outlined how data is shared with different teams using the new share functionality on Databricks, how teams are organized by domain, and the need for a data governance team that oversees security, access, and monitoring.

Effective Data Quality Checks and Monitoring with Databricks and Pandas – Esa Denaux

Quality is defined as meeting a predefined standard and prioritizing both correctness and transparency. In the session on data quality using Pandas and Databricks, I explored strategies to ensure high data quality throughout the data lifecycle, using De Lijn’s reference architecture and technology stack as an example. During the session with Esa, we discussed the use of visualization techniques like histograms, box plots, and scatter plots for detecting anomalies. We also considered summary statistics and data quality reports as tools for gaining deeper insight into data quality. This session has provided me with a comprehensive approach to data quality management, from the initial profiling and validation of data sets to the deployment of automated testing and monitoring systems. By focusing on both technical validation (through Pandas and Databricks) and strategic practices (like naming conventions and business rule enforcement), organizations can ensure that their data remains a valuable and reliable asset.

Exploring the art of Conditional Fromatting in Power BI – Anastasia Salari

I really appreciated the session given by Anastasia Salari. As an introduction, Anastasia explained the importance of conditional formatting through interactive examples that could easily be part of a business presentation to raise awareness on the use of appropriate visuals. She introduced effective techniques and uncovered the strategic value behind them, enhancing our understanding of both the ‘how’ and the ‘why.’ We learned how to use this simple yet powerful feature to streamline complex information and make reports not only visually appealing but also fundamentally more effective. Afterwards, a very interesting and detailed demo was given, showcasing a significant number of Power BI visuals featuring visual formatting. Anastasia demonstrated each time how she had implemented it, which gave us ideas for possible applications at the client’s site. The session provided immediate insights into how conditional formatting can improve how reports communicate data and elevate the overall impact of data visualization.

Optimizing Power BI Development: Unleashing the Potential of Developer Mode – Rui Romano

This session provided an insightful look into Developer Mode in Power BI, focusing on how it integrates developer-centric features such as source control and Azure DevOps. The presenter demonstrated how these tools enable better team collaboration and the creation of CI/CD pipelines, enhancing the scalability and reliability of Power BI projects. It was a very interesting presentation that highlighted powerful new features in Power BI, some of which are already partially available and will likely transform how we work with Power BI in the future.

In conclusion

This first part of our seminar feedback highlights just a glimpse of the rich knowledge and experiences we gained over this outstanding event. The insights shared were invaluable and have provided us with new perspectives on several key topics. Stay tuned for the second part, where we will continue to explore more of the seminars and share additional takeaways that will certainly fuel our future growth!

Authors: Alexe Deverdenne, Hugo Henris, Martin Izquierdo, Pierre-Yves Richer, Sophie Opsommer, Thibaut De Carvalho

SHARE ON :

November 4, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 2

Welcome to the second part of our Dataminds Connect 2024 recap! After covering the first two days of this event in our initial article, we’re excited to share...

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...

May 28, 2024

Read in minutes

Enhancing Real-Time Data Processing with Databricks: Apache Kafka vs. Apache Pulsar

In the era of big data, real-time data processing is essential for organizations seeking immediate insights and the ability to respond swiftly to changing marke...

back to all articles

Enhancing Real-Time Data Processing with Databricks: Apache Kafka vs. Apache Pulsar

May 28, 2024

Analytics Data Integration Microsoft Azure

Read in minutes

In the era of big data, real-time data processing is essential for organizations seeking immediate insights and the ability to respond swiftly to changing market conditions. Apache Kafka and Apache Pulsar are two of the most popular platforms for managing streaming data. Their integration with Databricks, a powerful analytics platform built on Apache Spark, enhances these capabilities, providing robust solutions for real-time data management. This article explores the features of Kafka and Pulsar, compares their strengths, and provides guidance on which to choose based on specific use cases.

Apache Kafka: A Standard in Data Streaming

Apache Kafka is a distributed streaming platform originally developed by LinkedIn and later donated to the Apache Software Foundation. Kafka’s architecture is based on a distributed log, where data is written to “topics” divided into partitions.

Topics act as categories for data streams, while partitions are the individual logs that store records sequentially. This division allows Kafka to scale horizontally, enabling high throughput and parallel processing. Partitions also ensure fault tolerance by replicating data across multiple brokers, which maintains data integrity and availability.

Kafka excels in scenarios that require rapid ingestion and real-time processing of large volumes of data. Its ecosystem includes Kafka Streams for stream processing, Kafka Connect for integrating various data sources, and KSQL for querying data streams with SQL-like syntax. These features make Kafka ideal for applications such as monitoring, log aggregation, and real-time analytics.

Key Features of Kafka:

High Throughput and Low Latency: Capable of handling millions of messages per second with minimal delay, making it suitable for applications that require quick data processing.
Durable Storage: Messages can be stored for a configurable retention period, allowing for replay and historical analysis.
Mature Ecosystem: Includes robust tools for stream processing, data integration, and real-time querying.

Apache Pulsar: The Next Generation of Streaming

Apache Pulsar is a distributed messaging and streaming platform developed by Yahoo and now managed by the Apache Software Foundation. Pulsar’s architecture separates message delivery from storage using a two-tier system comprising Brokers and BookKeeper nodes. This design enhances flexibility and scalability.

Brokers handle the reception and delivery of messages, while BookKeeper nodes manage persistent storage. ZooKeeper plays a crucial role in this architecture by coordinating the metadata and configuration management. This separation allows Pulsar to scale storage independently from message handling, providing efficient resource management and improved fault tolerance. Brokers ensure smooth data flow, BookKeeper nodes ensure data durability, and ZooKeeper maintains system coordination and consistency.

Pulsar supports advanced features such as multi-tenancy, geographic replication, and transaction support. Its multi-tenant capabilities allow multiple teams to share the same infrastructure without interference, making Pulsar suitable for complex, large-scale applications. Additionally, Pulsar supports various APIs and protocols, facilitating seamless integration with different systems.

Key Features of Pulsar:

Multi-Tenancy: Supports multiple tenants with resource isolation and quotas, providing efficient resource management.
Advanced Features: Includes geographic replication for data availability across data centers and transaction support for consistent message delivery.
Flexible Integrations: Supports various APIs and protocols, enabling easy integration with different systems.

Comparing Apache Kafka and Apache Pulsar

While both Kafka and Pulsar are designed for real-time data streaming, they have distinct characteristics that may make one more suitable than the other depending on specific use cases.

Performance and Scalability: Kafka is known for its high throughput and low latency, making it ideal for applications requiring rapid data ingestion and processing. It is well-suited for high-performance use cases where low latency is critical. Pulsar, on the other hand, offers similar performance levels but excels in scenarios requiring multi-tenancy and seamless scaling. Its architecture separating compute and storage makes Pulsar preferable for applications needing flexible scaling and multi-tenant support.

Architecture and Flexibility: Kafka uses a simpler, monolithic architecture which can be easier to deploy and manage for straightforward use cases. This simplicity can be advantageous for quick and efficient setup. In contrast, Pulsar’s two-tier architecture provides more flexibility, especially for applications requiring geographic replication and fine-grained resource management. Pulsar is better suited for complex architectures needing advanced features.

Feature Set: Kafka’s extensive ecosystem, including tools like Kafka Streams, Kafka Connect, and KSQL, makes it a comprehensive solution for stream processing and real-time querying. This makes Kafka ideal for use cases that leverage its mature set of tools. Pulsar includes advanced features like native multi-tenancy, message replication across data centers, and built-in transaction support. These features make Pulsar preferable for applications requiring sophisticated capabilities.

Community and Ecosystem: Kafka has a larger and more mature ecosystem with widespread adoption across various industries, making it a safer bet for long-term projects needing extensive community support. Pulsar, while rapidly growing, offers cutting-edge features particularly appealing for cloud-native and multi-cloud environments. Pulsar is more appropriate for modern, cloud-native applications.

Integration with Databricks

Databricks, built on Apache Spark, leverages both Kafka and Pulsar to provide powerful and scalable real-time data processing capabilities. Here’s how these integrations enhance Databricks:

Databricks offers built-in connectors for reading and writing data directly from & to Kafka, enabling users to build real-time data pipelines using Spark Structured Streaming. This facilitates the transformation and analysis of data streams in real-time.

Similarly, Databricks supports Apache Pulsar, allowing for real-time data streaming with exactly-once processing semantics. Pulsar’s features such as geographic replication and transaction support enhance the resilience and reliability of streaming applications on Databricks.

Benefits of Integration

Integrating Kafka and Pulsar with Databricks provides several benefits. The scalability of both platforms allows for handling large volumes of real-time data without compromising performance. Pulsar’s multi-tenant capabilities and Kafka’s extensive features provide flexible integration tailored to specific business needs. Databricks also offers robust tools for access management and data governance, enhancing the security and reliability of streaming solutions.

Conclusion

Integrating Kafka and Pulsar with Databricks allows organizations to leverage leading streaming technologies to build efficient and scalable real-time data pipelines. By combining the power of Spark with Kafka’s resilience and Pulsar’s flexibility, Databricks provides a robust platform to meet the growing needs of real-time data processing.

For high-speed, low-latency applications, Kafka is the preferred choice. For complex, multi-tenant environments requiring advanced features like geographic replication and transaction support, Pulsar is more suitable.

Author: Pierre-Yves RICHER, Data Engineering Practice Leader at AKABI

SHARE ON :

November 4, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 2

Welcome to the second part of our Dataminds Connect 2024 recap! After covering the first two days of this event in our initial article, we’re excited to share...

October 18, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

The Dataminds Connect 2024 event, held in the picturesque city of Mechelen, Belgium, is a highly anticipated three-day gathering for IT professionals and Micros...

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...

back to all articles

Revolutionizing Data Engineering: The Power of Databricks’ Delta Live Tables and Unity Catalog

February 20, 2024

Business Inteligence Data Integration Microsoft Azure

Read in 5 minutes

Databricks has emerged as a pivotal platform in the data engineering landscape, offering a comprehensive suite of tools designed to tackle the complexities of data processing, analytics, and machine learning at scale. Among its innovative offerings, Delta Live Tables (DLT) and Unity Catalog stand out as transformative features that significantly enhance the efficiency and reliability of data pipelines. This article delves into these concepts, elucidating their functionalities, benefits, and their particular relevance to data engineers.

Delta Live Tables (DLT): Revolutionizing Data Pipelines

Delta Live Tables is an ETL framework built on top of Databricks, designed to streamline the development and maintenance of data pipelines. With DLT, data engineers can define declarative pipelines that automatically manage complex data transformations, dependencies, and error handling. This high-level abstraction allows engineers to focus on business logic and data transformations rather than the operational complexities of pipeline orchestration.

Key Features and Advantages:

Declarative Syntax: DLT allows data engineers to define transformations using SQL or Python, specifying what the data should look like rather than how to achieve it. This declarative approach simplifies pipeline development and maintenance.
Automated Error Handling: DLT provides robust error handling mechanisms, including automatic retries, dead-letter queues for unprocessable messages, and detailed error logging. This reduces the time data engineers spend on debugging and fixing pipeline issues.
Data Quality Controls: With DLT, data engineers can embed data quality checks directly into their pipelines, ensuring that data meets specified quality constraints before it moves downstream. This built-in validation mechanism enhances data reliability and trustworthiness.
Live Tables: DLT continuously monitors for new data and incrementally updates its outputs, ensuring that downstream users and applications always have access to fresh, high-quality data. This real-time processing capability is crucial for time-sensitive analytics and decision-making.
Change Data Capture (CDC): DLT supports the capture of changes made to source data, enabling seamless and efficient integration of updates into data pipelines. This feature ensures that data reflects the latest changes, crucial for accurate analytics and real-time reporting.
Historical and Live Views: Data engineers can create views that either maintain a history of data changes or display the most current data. This allows users to access data snapshots over time or see the present state of data, thereby facilitating thorough analysis and informed decision-making.

Unity Catalog: Centralizing Data Governance

Unity Catalog enhances Databricks by introducing a unified governance framework for all data and AI assets in the Lakehouse, centralizing metadata management, access control, and auditing to streamline data governance and security at scale.

A data catalog acts as an organized inventory for an organization’s data assets, providing metadata, usage, and source information to facilitate data discovery and management. Unity Catalog realizes this by integrating with the Databricks Lakehouse, offering not just a cataloging function but also a unified approach to governance. This ensures consistent security policies, simplifies data access management, and supports comprehensive auditing, helping organizations navigate their data landscape more efficiently and in compliance with regulatory requirements.

Key Features and Advantages:

Unified Metadata Management: Unity Catalog consolidates metadata across various data assets, including tables, files, and machine learning models, providing a single source of truth for data governance.
Fine-grained Access Control: With Unity Catalog, data engineers can define precise access controls at the column, row, and table levels, ensuring that sensitive data is adequately protected and compliance requirements are met.
Cross-Service Policy Enforcement: Unity Catalog applies consistent governance policies across different Databricks workspaces and services, ensuring uniform security and compliance posture across the data landscape.
Data Discovery and Lineage: It facilitates easy discovery of data assets and provides comprehensive lineage information, enabling data engineers to understand data origins, transformations, and dependencies. This transparency is vital for troubleshooting, impact analysis, and compliance auditing.
Auditing: This feature tracks data interactions, offering insights into user activities and changes within the Databricks environment. This facilitates compliance and security by providing a detailed audit trail for accountability and analysis.

Integration: Synergy Between DLT and Unity Catalog

The integration of Delta Live Tables and Unity Catalog within Databricks provides a cohesive and powerful environment for data engineering. DLT’s streamlined pipeline management, combined with Unity Catalog’s robust governance framework, offers a comprehensive solution for building, managing, and securing data pipelines at scale.

Enhanced Data Reliability: DLT’s real-time processing and data quality checks, coupled with Unity Catalog’s governance capabilities, ensure that data pipelines produce accurate, reliable, and compliant data outputs.
Increased Productivity: The declarative nature of DLT and the centralized governance of Unity Catalog reduce the complexity and overhead associated with data pipeline development and management, allowing data engineers to focus on delivering value.
Scalability and Flexibility: Both DLT and Unity Catalog are designed to scale with the needs of the business, accommodating large volumes of data and complex data transformations without sacrificing performance or manageability.

Conclusion: Empowering Data Engineers

For data engineers, the combination of Delta Live Tables and Unity Catalog within Databricks represents a significant leap forward in terms of productivity, data quality, and governance. By abstracting away the complexities of pipeline development and data management, these features allow engineers to concentrate on solving business problems through data. The result is a more efficient, reliable, and secure data infrastructure that can drive insights and innovation at scale. As the data landscape continues to evolve, tools like DLT and Unity Catalog will be indispensable in empowering data engineers to meet the challenges of tomorrow.

It’s important to note that, although Delta Live Tables (DLT) and Unity Catalog are designed to work together seamlessly within the Databricks environment, it’s perfectly viable to pair DLT with a different data cataloging system. This versatility allows organizations to take advantage of DLT’s sophisticated capabilities for automating and managing data pipelines while still utilizing another data catalog that may align more closely with their existing infrastructure or specific needs. Databricks supports this flexible data management strategy, enabling businesses to leverage DLT’s real-time processing and data quality enhancements without being restricted to using only Unity Catalog.

As we explore the horizon of technological innovation, it’s evident that the future is unfolding before us. Engaging with the latest advancements in data management and governance is more than just keeping pace; it’s about seizing the opportunity to redefine how we interact with the vast universe of data. The moment has come to embrace these new possibilities, leveraging their power to drive forward our data-centric initiatives.

Author: Pierre-Yves RICHER, Data Engineering Practice Leader at AKABI

SHARE ON :

November 4, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 2

Welcome to the second part of our Dataminds Connect 2024 recap! After covering the first two days of this event in our initial article, we’re excited to share...

October 18, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

The Dataminds Connect 2024 event, held in the picturesque city of Mechelen, Belgium, is a highly anticipated three-day gathering for IT professionals and Micros...

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...

back to all articles

AKABI’s Consultants Share Insights from Dataminds Connect 2023

November 20, 2023

Analytics Business Inteligence Data Integration Event Microsoft Azure + 1

Read in 5 minutes

Dataminds Connect 2023, a two-day event taking place in the charming city of Mechelen, Belgium, has proven to be a cornerstone in the world of IT and Microsoft data platform enthusiasts. Partly sponsored by AKABI, this event is a gathering of professionals and experts who share their knowledge and insights in the world of data.

With a special focus on the Microsoft Data Platform, Dataminds Connect has become a renowned destination for those seeking the latest advancements and best practices in the world of data. We were privileged to have some of our consultants attend this exceptional event and we’re delighted to share their valuable feedback and takeaways.

How to Avoid Data Silos – Reid Havens

In his presentation, Reid Havens emphasized the importance of avoiding data silos in self-service analytics. He stressed the need for providing end users with properly documented datasets, making usability a top priority. He suggested using Tabular Editor to hide fields or make them private to prevent advanced users from accessing data not meant for self-made reports. Havens’ insights provided a practical guide to maintaining data integrity and accessibility within the organization.

Context Transition in DAX – Nico Jacobs

Nico Jacobs took on the complex challenge of explaining the concept of “context” and circular dependencies within DAX. He highlighted that while anyone can work with DAX, not everyone can understand its reasoning. Jacobs’ well-structured presentation made it clear how context influences DAX and its powerful capabilities. Attendees left the session with a deeper understanding of this essential language.

Data Modeling for Experts with Power BI – Marc Lelijveld

Marc Lelijveld’s expertise in data modeling was on full display as he delved into various data architecture choices within Power BI. He effortlessly navigated topics such as cache, automatic and manual refresh, Import and Dual modes, Direct Lake, Live Connection, and Wholesale. Lelijveld’s ability to simplify complex concepts made it easier for professionals to approach new datasets with confidence.

Breaking the Language Barrier in Power BI – Daan Lambrechts

Daan Lambrechts addressed the challenge of multilingual reporting in Power BI. While the tool may not inherently support multilingual reporting, Lambrechts showcased how to implement dynamic translation mechanisms within Power BI reports using a combination of Power BI features and external tools like Metadata Translator. His practical, step-by-step live demo left the audience with a clear understanding of how to meet the common requirement of multilingual reporting for international and multilingual companies.

Lessons Learned: Governance and Adoption for Power BI – Paulien van Eijk & Teske van Maaren

This enlightening session focused on the (re)governance and (re)adoption of Power BI within organizations where Power BI is already in use, often with limited governance and adoption. Paulien van Eijk and Teske van Maaren explored various paths to success and highlighted key concepts to consider:

Practices: Clear and transparent guidance and control on what actions are permitted, why, and how.
Content Ownership: Managing and owning the content in Power BI.
Enablement: Empowering users to leverage Power BI for data-driven decisions.
Help and Support: Establishing a support system with training, various levels of support, and community.

Power BI Hidden Gems – Adam Saxton & Patrick Leblanc

Participating in Adam Saxton and Patrick Leblanc’s “Power BI Hidden Gems” conference was a truly enlightening experience. These YouTube experts presented topics like Query folding, Prefer Dual to Import mode, Model properties (discourage implicit measures), Semantic link, Deneb, and Incremental refresh in a clear and engaging manner. Their presentation style made even the most intricate aspects of Power BI accessible and easy to grasp. The quality of the presentation, a hallmark of experienced YouTubers, made the learning experience both enjoyable and informative.

The Combined Power of Microsoft Fabric for Data Engineer, Data Analyst and Data Governance Manager – Ioana Bouariu, Emilie Rønning and Marthe Moengen

I had the opportunity to attend the session entitled “The Combined Power of Microsoft Fabric for Data Engineer, Data Analyst, and Data Governance Manager”. The speakers adeptly showcased the collaborative potential of Microsoft Fabric, illustrating its newfound relevance in our evolving data landscape. The presentation effectively highlighted the seamless collaboration facilitated by Microsoft Fabric among data engineering, analysis, and governance roles. In our environment, where these roles can be embodied by distinct teams or even a single versatile individual, Microsoft Fabric emerges as a unifying force. Its adaptability addresses the needs of diverse profiles, making it an asset for both specialized teams and agile individuals. Its potential promises to open exciting new perspectives for the future of data management.

Behind the Hype, Architecture Trends in Data – Simon Whiteley

I thoroughly enjoyed Simon Whiteley’s seminar on the impact of hype in technology trends. He offered valuable insights into critically evaluating emerging technologies, highlighting their journey from experimentation to maturity through Gartner’s hype curve model.

Simon’s discussion on attitudes towards new ideas, the significance of healthy skepticism, and considerations for risk tolerance was enlightening. The conclusion addressed the irony of consultants cautioning against overselling ideas, emphasizing the importance of skepticism. The section on trade-offs in adopting new technologies provided practical insights, especially in balancing risk and fostering innovation.

In summary, the seminar provided a comprehensive understanding of technology hype, offering practical considerations for navigating the evolving landscape. Simon’s expertise and engaging presentation style made it a highly enriching experience.

In Conclusion

Dataminds Connect 2023 was indeed a remarkable event that provided valuable insights into the world of data. We want to extend our sincere gratitude to the organizers for putting together such an informative and well-executed event. The knowledge and experiences gained here will undoubtedly contribute to our continuous growth and success in the field. We look forward to being part of the next edition and the opportunity to continue learning and sharing our expertise with the data community. See you next year!

Vincent Hermal, Azure Data Analytics Practice Leader
Pierre-Yves Richer, Azure Data Engineering Practice Leader
avec la participation très précieuse de Sophie Opsommer, Ethan Pisvin, Pierre-Yves Outlet et Arno Jeanjot

SHARE ON :

November 4, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 2

Welcome to the second part of our Dataminds Connect 2024 recap! After covering the first two days of this event in our initial article, we’re excited to share...

October 18, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

The Dataminds Connect 2024 event, held in the picturesque city of Mechelen, Belgium, is a highly anticipated three-day gathering for IT professionals and Micros...

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...

back to all articles

DP-500 : How to successfully pass the exam?

January 27, 2023

Analytics Microsoft Azure

Read in minutes

Are you looking to earn the Microsoft certification DP-500: Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI? If so, you’re not alone! This certification is highly sought after by professionals looking to advance their careers in the field of data analytics. In this LinkedIn article, we’ll provide some expert tips to help you prepare for and pass this important certification exam.

First, let’s start by looking at what this certification covers. The DP-500 certification is geared towards professionals who are responsible for designing and implementing large-scale analytics solutions using Microsoft Azure Synapse Analytics and Microsoft Power BI. This includes tasks such as designing data pipelines, managing data storage, and creating dashboards and reports for business users.

To prepare for the DP-500 exam, it’s important to have a strong understanding of the following topics:

Microsoft Azure: This includes knowledge of Azure data storage options (such as Azure SQL Database and Azure Data Lake), as well as Azure data processing and analytics tools. You’ll also need to be familiar with Microsoft Purview.

Microsoft Power BI: This includes knowledge of Power BI desktop and online, as well as how to design and publish reports and dashboards using Power BI. You’ll also need to be familiar with Power BI data modeling and visualization techniques.

Data management and data governance: You’ll need to understand how to manage data at scale, including tasks such as data cleansing, data transformation, and data security.

Data visualization: You’ll need to be able to design effective data visualizations that effectively communicate insights to business users.

Some advice from one of our consultants

It is understandable that you may be feeling anxious or unsure about your chances of success on the DP-500 exam, especially if you have not had previous experience with Azure Synapse Analytics & Microsoft Purview. Prior to preparing for the exam, I had not had any experience using those two tools. These are important technologies that are covered on the exam, and it may have been necessary for you to spend additional time studying and gaining familiarity with them in order to fully prepare for the exam.

It is important to note that four weeks of study is a reasonable amount of time to prepare for the exam, as long as you use your study time effectively and focus on the most important exam objectives

So, what can you do to prepare for the DP-500 exam? Here are a few tips:

Use Microsoft’s official certification training materials: These materials are designed specifically to help you prepare for the DP-500 exam and are a great place to start.

Take online courses: There are many online courses available that can help you deepen your understanding of the topics covered on the DP-500 exam. One website that you might find helpful is Datamozart. This website offers a range of courses and resources for data professionals, including those preparing for the DP-500 exam.

Watch YouTube videos: There are many YouTube channels that offer helpful content for those preparing for the DP-500 exam. One channel that you might find particularly useful is Azure Synapse Analytics. This channel offers a range of videos on topics related to Azure Synapse Analytics, which is a key tool covered on the DP-500 exam.

Get insights from experts: Consider reaching out to experts in the field for advice on how to prepare for the DP-500 exam. Two Data Platform MVPs, Andy Cutler and Nikola Ilic, are known for their great explanations and insights on data platform topics. You might find it helpful to follow their blogs or watch their videos for additional guidance on preparing for the DP-500 exam.

Practice with sample questions: It is understandable that you may be looking for sample questions to help you prepare for the DP-500 exam. However, it is important to note that the quality and reliability of sample questions can vary greatly. Some sample questions may not accurately reflect the content or difficulty level of the actual exam, and using them as your sole source of preparation may not be sufficient to fully prepare you for the exam. Examtopic is a great website that provides information and resources for various IT certification exams. When I studied for the exam, the site did not contain any practice questions but now you can find sample questions here. It will probably help you a lot.

Gain hands-on experience: There’s no substitute for real-world experience when it comes to preparing for the DP-500 exam. Try working on projects using Azure and Power BI to get a feel for how these tools

I wish you the best of luck as you prepare for the DP-500 exam. Remember to stay focused, stay motivated, and keep up with your studies. With hard work and dedication, you can succeed on the exam and achieve your certification goals.

SHARE ON :

November 4, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 2

Welcome to the second part of our Dataminds Connect 2024 recap! After covering the first two days of this event in our initial article, we’re excited to share...

October 18, 2024

Read in minutes

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

The Dataminds Connect 2024 event, held in the picturesque city of Mechelen, Belgium, is a highly anticipated three-day gathering for IT professionals and Micros...

May 28, 2024

Read in minutes

Insights from the Gartner Data & Analytics Summit in London

I had the opportunity of attending the Gartner Data & Analytics Summit in London from May 13th to 15th. This three-day event featured over 100 sessions, man...

back to all articles

AKABI’s consultants share insights from Dataminds Connect 2024: part 2

Related articles

AKABI’s consultants share insights from Dataminds Connect 2024: part 1

Related articles

Enhancing Real-Time Data Processing with Databricks: Apache Kafka vs. Apache Pulsar

Related articles

Revolutionizing Data Engineering: The Power of Databricks’ Delta Live Tables and Unity Catalog

Related articles

AKABI’s Consultants Share Insights from Dataminds Connect 2023

How to Avoid Data Silos – Reid Havens

Context Transition in DAX – Nico Jacobs

Data Modeling for Experts with Power BI – Marc Lelijveld

Breaking the Language Barrier in Power BI – Daan Lambrechts

Lessons Learned: Governance and Adoption for Power BI – Paulien van Eijk & Teske van Maaren

Power BI Hidden Gems – Adam Saxton & Patrick Leblanc

The Combined Power of Microsoft Fabric for Data Engineer, Data Analyst and Data Governance Manager – Ioana Bouariu, Emilie Rønning and Marthe Moengen

Behind the Hype, Architecture Trends in Data – Simon Whiteley

In Conclusion

Related articles

DP-500 : How to successfully pass the exam?

Some advice from one of our consultants

Related articles