The Reality of ALTER TABLE… in Fabric Warehouses

0

Overview

With the recent announcement from Microsoft that ALTER TABLE…ADD COLUMN is now supported in Fabric Warehouses, you might be forgiven for saying “So what? Database systems have had this feature for decades!” And you’d be correct. Well, bear in mind that the engine that powers the Fabric Warehouse started its public life 5 years ago as Synapse Serverless SQL Pools. This Polaris engine has seen a significant enhancement within the Fabric ecosystem to deliver the Lakehouse SQL Endpoint and Warehouse services.

As the Warehouse SQL engine uses the Delta functionality to handle transactions and Parquet file format to store data (at its simplest) by default, any functionality that is added to Warehouses must work with the Delta format. In this blog I’ll cover:

  • ALTER TABLE support up until now
  • New ALTER TABLE…ADD COLUMN
  • Can we DROP and ALTER an existing column?
  • Delta Versioning & what’s in the Delta log…

Thanks to Mark Pryce-Maher for suggesting I “go poke around the Delta logs…”


Previous ALTER TABLE Support

Up until recently (July 2024), the available functionality within Fabric Warehouses to alter tables was limited to keys and constraints. We could add Primary Keys, Unique Constraints, and Foreign Keys to Warehouse tables by using ALTER TABLE… There is no support for adding keys or constraints in the CREATE TABLE statement, it must be done via ALTER TABLE. It’s worth noting that these keys and constraints must be created with the NOT ENFORCED keyword.

In the example below we’ll create a table in the Warehouse and alter it to add a primary key.

--create table
CREATE TABLE dbo.DimProduct
(
    ProductKey INT NOT NULL,
    ProductName VARCHAR(255)
);

--add primary key
ALTER TABLE dbo.DimProduct
ADD CONSTRAINT PK_DimProduct PRIMARY KEY NONCLUSTERED (ProductKey) NOT ENFORCED;

Introducing ALTER TABLE…ADD column

As of July 2024 we are now able to issue an ALTER TABLE…ADD column statement to add a new nullable column to the table. All we need to do is issue a standard ALTER TABLE ADD statement and the new column will be added to the existing table.

ALTER TABLE dbo.DimProduct
ADD ProductSubCategory VARCHAR(255);

Time Travel Issue when altering table

It’s worth noting that when issuing an ALTER TABLE statement, it resets the date/time the table is being tracked to the date/time the table was altered. For the existing functionality of altering a table to add primary, unique, and foreign key constraints it probably did not surface that regularly as these are often implemented when a table is first created. But now we can add new columns, this issue may surface more regularly. One to watch out for.


Any Other Support for ALTER TABLE?

But the fun stops there (for the time being) as if we try any other ALTER TABLE statement to either DROP a column or alter the data type, it doesn’t work.

ALTER TABLE dbo.DimProduct
ALTER COLUMN ProductSubCategory VARCHAR(100);

ALTER TABLE dbo.DimProduct
DROP COLUMN ProductSubCategory ;

For both commands above, we get the same error message:

Well, let’s look at why…


Delta Table Protocol Versioning

Delta tables have the concept of protocol versioning which will impact the functionality available when working with a Delta table. Let’s take a look at the Lakehouse now as we have control over the version/features of Delta that we can use. In comparison the Warehouse is a “closed box” in terms of the Delta table versioning available.

Create And Modify Delta Table

Let’s create a new Delta table in a Lakehouse using SQL in a Notebook, we’ll also alter the table and add a new column.

%%sql
CREATE TABLE dimproduct 
(
    ProductKey INT,
    ProductName STRING
) USING DELTA

ALTER TABLE dimproduct ADD COLUMN ProductSubCategory STRING

We can try and alter the table and drop an existing column (or rename and existing column).

%%sql
ALTER TABLE dimproduct DROP COLUMN ProductSubCategory

This doesn’t work and we’ll get an error stating:

DROP COLUMN is not supported for your Delta table. Please enable Column Mapping on your Delta table with mapping mode ‘name’. You can use one of the following commands.

If your table is already on the required protocol version: ALTER TABLE table_name SET TBLPROPERTIES (‘delta.columnMapping.mode’ = ‘name’)

If your table is not on the required protocol version and requires a protocol upgrade:

ALTER TABLE table_name SET TBLPROPERTIES ( ‘delta.columnMapping.mode’ = ‘name’, ‘delta.minReaderVersion’ = ‘2’, ‘delta.minWriterVersion’ = ‘5’)

We do have ways in PySpark to drop the column by re-writing the data from the table back and specifying to overwrite the schema. But that means reading and writing all the table data…

df = spark.sql("SELECT * FROM AIRBNB_RAW.dimproduct")

df.drop(df.ProductSubCategory) \
    .write.format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", True) \
    .save("Tables/dimproduct")

Upgrade Delta Table

Let’s upgrade the Delta table by using the code displayed in the original error message (pretty handy to be given the code in the actual error message itself!).

NB: it’s very much worth noting at this stage that although we can upgrade the Delta table in a Lakehouse to support column mapping, it will break metadata synchronization to the Lakehouse SQL Endpoint & Semantic Model as they are (like the Warehouse) no aware of the column mapping feature. Thanks to Frithjof Vassbø for reminding me!

%%sql
ALTER TABLE dimproduct 
SET TBLPROPERTIES ( 'delta.columnMapping.mode' = 'name', 'delta.minReaderVersion' = '2', 'delta.minWriterVersion' = '5')

We can now drop the column with no errors.

%%sql
ALTER TABLE dimproduct DROP COLUMN ProductSubCategory

What’s in the Delta Transaction Log?

So how does this work then? Let’s take a look inside the Delta log for the operations we carried out above. We’ll look at the log of the original table when it was created in the lakehouse, then look at the log after the table properties have been changed to enable column mapping. Then we’ll look at the Warehouse table log.

Create and Modify Delta Table

This is the Delta Transaction log for creating the table in the Lakehouse (I’ve omitted parts of the log to make it slightly more readable). In SchemaString you can see the fields in the table and the associated metadata name, type, and nullable.

{
    "protocol": {
        "minReaderVersion": 1,
        "minWriterVersion": 2
    }
}
{
    "metaData": {
        "id": "f439cc23-498d-4713-80e8-5f732032b0ab",
        "format": {
            "provider": "parquet",
            "options": {}
        },
        "schemaString": "{\"type\":\"struct\",\"fields\":[{\"name\":\"ProductKey\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"ProductName\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}",
        "partitionColumns": [],
        "configuration": {},
        "createdTime": 1721598597504
    }
}

After the ALTER TABLE ADD column statement was issued:

{
    "commitInfo": {
        "timestamp": 1721641468097,
        "operation": "ADD COLUMNS",
        "operationParameters": {
            "columns": "[{\"column\":{\"name\":\"ProductSubCategory\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}}]"}
}
{
        "schemaString": "{\"type\":\"struct\",\"fields\":[{\"name\":\"ProductKey\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"ProductName\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"ProductSubCategory\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}",
        "partitionColumns": [],
        "configuration": {},
        "createdTime": 1721598597504
    }
}

We can see the new column has been added and is present in the SchemaString section.

Now lets see the Delta log after we upgrade the table, we get some interesting metadata including:

  • {\”delta.columnMapping.id\”:1,\”delta.columnMapping.physicalName\”:\”ProductKey\”}}
  • {\”delta.columnMapping.id\”:2,\”delta.columnMapping.physicalName\”:\”ProductName\”}}
  • {\”delta.columnMapping.id\”:3,\”delta.columnMapping.physicalName\”:\”ProductSubCategory\”}}
{    
        "schemaString": "{\"type\":\"struct\",\"fields\":[{\"name\":\"ProductKey\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"delta.columnMapping.id\":1,\"delta.columnMapping.physicalName\":\"ProductKey\"}},{\"name\":\"ProductName\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"delta.columnMapping.id\":2,\"delta.columnMapping.physicalName\":\"ProductName\"}},{\"name\":\"ProductSubCategory\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"delta.columnMapping.id\":3,\"delta.columnMapping.physicalName\":\"col-f772a929-55a7-477c-924c-5dc1b8aeb86e\"}}]}",
        "partitionColumns": [],
        "configuration": {
            "delta.columnMapping.mode": "name",
            "delta.columnMapping.maxColumnId": "3"
        },
        "createdTime": 1721643187063
    }
}

This is how operations to alter existing columns can now work, by using the delta.columnMapping.id property, which will have a unique number assigned to each column.


Warehouse

If we take a look at the Delta log of the Warehouse table, we can see there is no columnMapping metadata present.

{
    "protocol": {
        "minReaderVersion":1,
        "minWriterVersion":1
    }
}

{"metaData": {
    "id":"18F30AE4-0ACA-4503-A8FD-63635C5C55D5",
    "name":"DimProduct",
    "format": {
        "provider":"parquet",
        "options":{}
    },
    "schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"ProductKey\",\"type\":\"integer\",\"nullable\":false,\"metadata\":{}},{\"name\":\"ProductName\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}",
    "partitionColumns":[],"configuration":{}}}

Let’s finally look at the properties of the table in the Warehouse and Lakehouse using DESCRIBE.

#Lakehouse table
display(spark.sql("DESCRIBE DETAIL delta.`abfss://<onelake_lakehouse_table_location>/Tables/dimproduct`"))

#Warehouse table
display(spark.sql("DESCRIBE DETAIL delta.`abfss://<onelake_warehouse_table_location>/Tables/dimproduct`"))

I’ve only select a few columns from the DESCRIBE statement and added Table Type to show where the properties are from. You can see that the Warehouse table has no columnMapping properties, the minReaderVersion is set to 1 and the minWriteVersion is set to 2.

Table TypeformatpropertiesminReaderVersionminWriteVersion
Lakehousedelta{“delta.columnMapping.mode”:”name”,”delta.columnMa…”
 delta.columnMapping.mode: “”name””

 delta.columnMapping.maxColumnId: “”3″”
25
Warehousedelta{}11

All this means is that the Warehouse currently doesn’t support the Delta column mapping feature, but as we’ve seen, Fabric is a fast evolving product so keep an eye out!


References

Leave a Reply

Your email address will not be published. Required fields are marked *