Microsoft Fabric Warehouses for the Database Administrator: Areas of Interest
With cloud technology, vendors will say you don’t have to worry about anything, it just works! It’s magic to a certain extent but also can be disconcerting if you can’t actually do anything due to a service being SaaS (Software-as-a-Service). Well, you might not have control over certain elements of the Fabric Warehouse, but you can certainly understand it. Imagine being tossed the keys to a Fabric Warehouse without so much as a “hi”!
Overview
data:image/s3,"s3://crabby-images/1cc01/1cc0106b7530c888997bb46fcab202ad780e95ee" alt=""
What is Microsoft Fabric?
Microsoft Fabric is Microsoft’s new data and analytics platform that enhances a Power BI tenant with significant functionality for data engineering, data warehouse, & realtime analytics. It uses open-source Delta Lake and Parquet files for storage (I would have loved to have been a fly on the wall at the Microsoft meeting when they agreed on using Delta!) and abstracts compute control into offering capacity units (CU) instead of traditional servers.
Benefits of Fabric
The whole point of Fabric is to provide a platform to allow end-to-end data solution, with data processing, ingestion and loading, storage, reporting, and machine learning all in the same platform without needing to do any of that “plumbing” between disparate cloud services.
data:image/s3,"s3://crabby-images/88cb8/88cb84e4e2a856b4444a34648ae5148bc7777cd8" alt=""
Fabric Warehouse & Usage Scenarios
Fabric Warehouse is a “Software-as-a-Service” (SaaS) solution designed for data storage and analysis using a data warehouse methodology, think star schemas. It still has all the familiar objects you expect from a database system: tables, stored procedures, functions etc.
- Open Storage Format: Utilizes Parquet for compressed, columnar storage.
- ACID Transactions: Delta Lake manages the transaction side of reading and writing data, plus helps deliver features like table cloning and time travel.
Fabric Warehouses are ideal for data warehousing workloads, particularly those involving dimensional modeling and star schemas. In a Medallion architecture, the warehouse serves as the “Gold” zone, handling modeled data for analysis.
Read more: Dimensional modeling in Microsoft Fabric Warehouse – Microsoft Fabric | Microsoft Learn
Role of the Database Administrator
Database administrators play a crucial role in managing Fabric Warehouses. They are your friend and you should buy them pizza (or whatever they like). Their responsibilities include (but certainly not limited to):
- Managing access to databases.
- Ensuring business continuity and disaster recovery.
- Installing and configuring database software.
- Troubleshooting issues and maintaining system uptime.
- Minimizing downtime during deployments.
- Fixing SQL Developers code…
See? An important list of responsibilities. When you operate services in the cloud then some of that will be abstracted away by the cloud vendor, but you need to pick up the rest and certainly have an appreciation of what’s happening “behind the scenes.”
Fabric Compute
Fabric compute needs to be provisioned in Azure, with various SKUs available for purchase. Capacity planning is essential to ensure the system can handle average workloads, utilizing bursting and smoothing techniques to manage demand.
data:image/s3,"s3://crabby-images/5e48b/5e48bfb2fc8d51adea203ba2795575292ec203a2" alt=""
SKU Guardrails
While Fabric allows for bursting and smoothing of workloads, there are guardrails in place to ensure reasonable constraints. Different capacity sizes have specific multiplication factors applied to manage resource usage effectively.
data:image/s3,"s3://crabby-images/f0d3c/f0d3c95b22d54cbb68f1ab8fe6638b68cc90951f" alt=""
Read more: Burstable capacity – Microsoft Fabric | Microsoft Learn
Connectivity and Client Tools
Fabric Warehouse integrates seamlessly with various client tools, including SQL tools like SSMS and Azure Data Studio. All workspaces have a single SQL Endpoint for connectivity, simplifying access and management. The URL in Lakehouse SQL Endpoints and Warehouses within a workspace is the same. When you connect using a client tool like SSMS, the workspace acts like a server, with the Lakehouse SQL Endpoints and Warehouses appearing as databases.
data:image/s3,"s3://crabby-images/9ceea/9ceeac1ad36d48f26affaadc8eed78e659180a79" alt=""
Read more: Workload management – Microsoft Fabric | Microsoft Learn
Storage
We’ve touched on the storage layer being Delta Lake and Parquet, but what can you actually do with the storage? There are automatic “optimise” and “vacuum” maintenance on the Delta tables based on usage, you can’t actually do anything yourself in terms of optimising/vacuuming tables.
data:image/s3,"s3://crabby-images/5af4b/5af4bcd78cf4eab980b87296e22a71ff61092442" alt=""
We can actually use Spark in a Notebook to read the Warehouse table metadata (and actually read the table data itself).
#Warehouse table display(spark.sql("DESCRIBE DETAIL delta.`abfss://<onelake_warehouse_table_location>/Tables/dimproduct`"))
Read more: Delta Lake logs in Warehouse – Microsoft Fabric | Microsoft Learn
Transactions and Isolation Levels
Fabric Warehouse supports transactions, including multi-table transactions, with Snapshot Isolation as the only supported isolation level. Locks are managed at the table level, and DMVs provide insights into current locks and transaction statuses.
INSERTS will rarely be blocked as Parquet files are immutable, so all INSERTS are done on new Parquet files with the Delta Lake logs being updated. However, UPDATES and DELETES can fail under certain circumstances. In the example below, 2 users issue UPDATE statements on a Warehouse table. Even though User 1 started the transaction first, if User 2 finished first, then User 1s UPDATE will fail.
Example
data:image/s3,"s3://crabby-images/ccceb/cccebe8a8ccc171413cd6e46825da5841f487061" alt=""
DELETE operations will have the same issue as UPDATES.
Read more: Transactions in Warehouse tables – Microsoft Fabric | Microsoft Learn
Security and Permissions
Security in Fabric Warehouses is multi-level which can cause some confusion, with permissions applied at the Fabric workspace, warehouse, and object levels (tables etc). There is no concept of CREATE LOGIN or CREATE USER in the Warehouse. You must “share” the Warehouse to a user/group to create the logic. From there you can then apply relevant permissions.
data:image/s3,"s3://crabby-images/69310/69310d19e507edf4ee331605fa6e611af1c4c324" alt=""
Read more: Security for data warehousing – Microsoft Fabric | Microsoft Learn
Monitoring
Monitoring tools like the Fabric Capacity Metrics App and Query Insights DMVs provide detailed insights into query performance and resource utilization. Performance tuning involves optimizing statistics, choosing the best data types, and managing batch inserts.
Read more: Monitoring in Fabric Data Warehouse overview – Microsoft Fabric | Microsoft Learn
Disaster Recovery
Fabric Warehouse offers limited disaster recovery options, including restore-in-place. You can restore to a known “good” state and is useful when any corruption occurs, a failed deployment, or a failed data loading process which has corrupted data. But we can only restore the whole Warehouse only, not individual tables. There is also 30 day retention period we can restore to, this is set by Microsoft and can’t be changed. The 30 day retention period is tied to the Delta log, and is the same retention period for writing “time travel” queries.
Read more: Restore in-place of a warehouse from a restore point – Microsoft Fabric | Microsoft Learn
Source Control
Azure Data Studio and Git integration enable source control for Fabric Warehouse, supporting build and deploy processes. Deployment pipelines should be used with caution to avoid data loss. Things are little trickier here and not as straight forward as I’d like it to be.
There’s a big problem at the moment with schema changes, you can’t issue any of the following:
- ALTER TABLE DROP COLUMN…
- ALTER TABLE ALTER COLUMN…change properties
This means we have issues when using tools like sqlpackage to deploy changes.
There’s also an issue with Fabric Deployment Pipelines in that if you do change the schema of a table outside of ADDing columns, the deployment pipeline process will drop the destination table and recreate it…without the original data.
Read more: Source control with Warehouse (preview) – Microsoft Fabric | Microsoft Learn
Conclusion
I feel Microsoft can invest more into the Fabric Warehouse service, I appreciate it as a new Microsoft “proprietary” SQL engine using open source technology but there is a ways to go for it to be a fully-formed service. However, I hope the pointers in this blog help you work more effectively with the Fabric Warehouse service.