Working with Dbt and The Fabric Warehouse

Blog Contents

Overview

Welcome to the world of Microsoft Fabric Warehouses and the deployment of data loading projects using Dbt (Data Build Tool). In this blog I’ll work through how these technologies can be combined to enhance data warehousing and analytics workflows. Essentially Fabric Warehouse is the canvas and the paint brush, but doesn’t actually paint anything for you. Dbt is that tool that helps you paint the picture. I’m always interested in workflow, productivity, templating, & automation tools for Data Warehousing and I believe Dbt can really help organisations put together a framework for their data warehousing/analytics projects.

With Dbt, I can build a dimension modelling process from source (cleansed) data, I define the dimensions and facts including slowly changing dimensions, incremental loads, and then Dbt will work out the table loading dependencies. I like that a lot.

Video

This video will guide you through Dbt and how it can work with Fabric Warehouses, including the all important limitations.

Understanding Microsoft Fabric

Microsoft Fabric is a Software-as-a-Service (SaaS) platform that brings together various data services under a common compute engine. This includes SQL data warehousing, Spark workloads, real-time streaming with Kusto Query Language (KQL), and Power BI. The platform is designed to handle both batch and real-time data processing, so it suits a variety of workloads for an organisation.

Key Features of Fabric Warehouse

Separation of Compute and Storage: Fabric Warehouse uses the Polaris engine (enhanced Synapse Serverless SQL Pools) for compute and Delta Lake for storage, ensuring transactional consistency.
SQL Familiarity: Developers can bring their T-SQL knowledge to create tables, views, and stored procedures within the Warehouse (there are syntax limitations, see the T-SQL Surface Area doc)
Dimensional Modeling: The Warehouse is optimized for dimensional modeling, making it ideal for building star schemas and other data warehouse structures. Star Schema All The Things!

Introducing Dbt

Dbt is a powerful tool for transforming and loading data within a data platform (for me, this is a data warehouse). It allows developers to define data transformations using SQL and then compiles these transformations into executable SQL code. Dbt is particularly useful for managing dependencies between models (database tables) and ensuring that data transformations are executed in the correct order. This is a part of Dbt I really like, auto-creation of the DAG for data loading.

Dbt executes SQL (and Python if you wish) on the destination data platform itself, and as such, the functionality available within Dbt will depend on the syntax support of the data platform.

Why Use Dbt with Fabric Warehouse?

Framework for Data Transformation: Dbt provides a structured framework for defining and managing data transformations, making it easier to build and maintain data warehouses. Although this can be the case, it could also lead to a mess of transformations if you don’t structure your Dbt project well – look at the Dbt Best Practices material here.
Dependency Management: Dbt automatically manages dependencies between models (tables), ensuring that transformations are executed in the correct sequence.
Schema Evolution: Dbt supports schema evolution, allowing users to add new columns to tables without disrupting existing data.

How Dbt Works with Fabric Warehouse

For the Fabric Warehouse, we can use the dbt-fabric adapter when using dbt Core, or/and setup connectivity in dbt Cloud. I’ll be honest here and say I much prefer using Dbt Core as it integrates nicely into VS Code, has a suit of Power Tools that give the same DAG visualisation/dependency diagrams as in Dbt Cloud, and I can run the Dbt project from any compute and service that supports it (e.g. Databricks has a really nice task in Workflows that can trigger Dbt jobs…). The integration involves defining models in DBT, which are then materialized as tables or views in the Fabric Warehouse.

Example Workflow

Define Models: Users define models in Dbt using SQL. These models specify how data should be transformed and loaded into the Warehouse.
Compile and Execute: Dbt compiles the SQL code and executes it on the Fabric Warehouse, creating the necessary tables and views.
Manage Dependencies: Dbt automatically manages dependencies between data models, ensuring that transformations are executed in the correct order.

Limitations and Caveats

While Dbt and Fabric Warehouse work well together, there are some limitations and caveats to be aware of:

Unsupported SQL Commands: Some SQL commands are not supported in Fabric Warehouse, which can lead to errors when running Dbt models.
Schema Evolution: While Dbt supports schema evolution, Fabric Warehouse has limitations on altering table schemas. For example, adding new columns is supported, but dropping or modifying columns is not – this will affect which model strategy you can use. E.G I always use “Add New Columns” strategy as this will not only add new columns to a table that appear in a model, but it’ll ignore columns missing.

Conclusion

I like combining Microsoft Fabric Warehouse with Dbt as I believe it offers a great solution for building and managing data warehouses, streamlining the entire process of table loading dependencies. I think that once we get proper ALTER TABLE support for dropping columns then we’ll be able to use any of the Dbt model strategies. I’m looking forward to working with Dbt more over the coming months.

Resources

Tags: Dbt, Warehouse

Working with Dbt and The Fabric Warehouse

Overview

Video