Synapse Analytics Source Control with Serverless SQL Pools Databases
Working with source control in a Synapse Analytics workspace is a fairly straight-forward process, however you only have a choice between Azure DevOps or GitHub, and it only works with Git. Hooking up a Synapse workspace to a repo is straight-forward and allows for branching and merging. But, there’s a bit of a challenge in terms of the SQL databases, they are not included as part of source control – this needs to be handled separately.
In this blog we’ll get Synapse artifacts and SQL Databases source controlled together.
In this blog I’ll walkthrough a solution I’ve worked on recently in which both Synapse artifacts (pipelines, linked services etc.) are source controlled and can be branched and developed alongside database projects.
Goal
- Each engineer has access to a shared Synapse Analytics development workspace
- Each engineer has an Azure DevOps account and access to the repo (we’ll create this repo) that the Synapse workspace is connected to
- The engineers must be able to create a development branch and then merge changes back a via pull request into the main dev branch
- A Branch Policy must be in place so that the pull request is reviewed by at least 1 other person (not the originator)
Repo Structure
We’ll sync the shared development Synapse workspace to the main branch of an Azure DevOps repo, then we’ll add another folder called sqldatabases to the root of the repo. This has no affect on the Synapse workspace connectivity, and allows us to add the Serverless SQL Pools database projects to the same repo using Azure Data Studio, which can then be cloned/merged etc. as with any of the Synapse artifacts (you just don’t see the database source control folder in the Synapse workspace itself).
I’m using /main as the branch that is deployed to the shared development workspace, your repo structure may differ (e.g. a main branch for the live code deployed to a live Synapse workspace, and a shared_dev branch for the dev code). The same principles will apply.
Let’s go through what tasks we’ll be going through to set-up DevOps, link the Synapse workspace, and get Azure Data Studio Database projects up and running.
Basic Tasks
- Create a Git repo in an Azure DevOps project
- Link development Synapse workspace to new repo
- Use ADS to clone Synapse repo locally and add sqldatabase folder
- Create a database project in ADS from existing Serverless SQL Pools database
- Commit and push new database project into DevOps
- Branch and then development a new pipeline and new Serverless SQL Pools view.
- Create a pull request and merge back to main dev branch
Prerequisites
- A Synapse Analytics workspace
- An Azure DevOps account
- Azure Data Studio (ADS) installed at v1.46.1
- Database Projects extension installed in ADS v 1.3.1
- Ensure that you have at least v5.7 of the Azure SDK installed (you may get Build errors in ADS if it’s out of date)
Walkthrough
Let’s go through what steps are necessary to complete this scenario, please note that the development Synapse Analytics workspace has already been created, but have not yet been linked to source control. The Synapse workspace has a single Integration Runtime, a single Serverless SQL Pools database, and a single Pipeline to move data into an Azure Data Lake Gen2 folder.
Create Repo in Azure DevOps
- Login to an Azure DevOps account and select an existing project
- On the main menu click Repos
- Click the drop-down menu at the top of the screen and select New repository
- Give the repo a name like synapse_shared_dev, accept the defaults and click Create
Link Shared Synapse Workspace to GitHub
Once the repo has been created, we’ll configure the shared development Synapse workspace to connect to this repo. The shared dev workspace already has some artifacts in like Pipelines and a Serverless SQL Pools database.
- Login to the shared Synapse workspace
- Click on Manage > Git Configuration > Configure
- Select Azure DevOps in the repository type, then the relevant Active Directory tenant
- Select the relevant organization name, project name, then the synapse_shared_dev created in the earlier step.
- We’ll keep the collaboration branch as main and the publish branch as workspace_publish
- Keep root folder as / and ensure Import existing resources to repository is checked
- Select the main branch to import to, then click Apply
We’ve now synced the shared development Synapse workspace into the DevOps Git repo.
Clone Repo Locally using Azure Data Studio (ADS)
For this section, we’ll be using Azure Data Studio (ADS) (download here) to connect to Azure DevOps and clone the repo that the Synapse workspace is connected to, we’ll then add in the Database Project into a new folder and commit/push back to the main branch. Please note ADS needs to be v1.46.1 and the SQL Database Projects extension needs installing (using the Extensions area) and be up to v 1.3.1.
Please also check out Kevin Chant’s (MVP) blog on all things source control, a treasure trove of information.
- Open ADS and ensure you have not opened a workspace or folder
- File > Close Folder
- File > Close Workspace
- Click the Source Control icon on the left
- Select Clone Repository and enter the URL of the DevOps Git repo
- Find the URL in the DevOps repo by navigating to the main page of the repo and clicking the Clone button in the top-right and copy the HTTPS URL.
- When prompted, select a local folder to store the repo and click Select as Repository Destination
- Click on the Explorer icon to see the folder contents of the cloned Synapse repo.
- Click the New Folder icon in the top and give it a name like sqldatabases
Add Serverless SQL Pools Database Project to repo
- Click the Database Projects icon on the left and click Create New
- Select Azure SQL Database and enter a database name
- Select the newly created sqldatabase folder
- Enable the SDK-style project and click Create
- Once created, right-click the database/project name in the database projects browser, and select Change Target Platform.
- Change the platform to Azure Synapse Serverless SQL Pool
- Right-click the database/project name again and select Schema Compare
- Click Switch Direction to make the project the Target
- For the Source, click the ellipsis then:
- Select Database as the Type
- Click the connections icon and create a connection to the Synapse Serverless SQL Pools endpoint
- Authenticate and then select the relevant database
- Click Connect
Now the connections have been made, let’s include certain database objects.
- In the main schema compare screen, click Options (gear icon) then select the Include Object Types tab
- Enable the following:
- Credentials
- Database Encryption Keys
- Database Scoped Credentials
- Master Keys
- Click OK and when prompted select Yes to recompare.
- Then click Apply to update the project with the database objects
- Once done, right-click the database/project name again and select Build
We can now commit and push the Database Project into the DevOps repo.
- Select the Source Control icon
- Enter a commit message and click Commit (I usually click Yes for staging)
- Then click Sync changes to push to the repo.
If we browse the remote repo in DevOps we can see the sqldatabases folder with the Database Project.
Now that we’ve got the Serverless SQL Pools database project into source control, let’s now concentrate on branching the code, developing, and then merge that development work back into the main dev branch.
Develop in Synapse and Database Project
Let’s now amend an existing Pipeline and add a new View to the Serverless SQL Pools project.
Create Branch
- Login to the Synapse workspace and click on Integrate
- In the source control drop-down, select new branch
- Give the branch an appropriate name E.G. AC_20231012_Feature123 and click Create
Create a New or Amend Existing Pipeline
- I already have a pipeline in my Synapse workspace so I’ll add a Wait activity
- Then click Commit All in the top Synapse menu
Add New View to Serverless SQL Pools Database
- In ADS, click on the Source Control icon
- Click the ellipsis at the top and select Fetch
- Then click the ellipsis again and select Checkout to…
- You should see the remote branch you created earlier, select this.
- Once the branch has been checked out and is active in ADS, click the Database Projects icon
- Create a new folder Views (in my scenario) then right-click this folder and select New View
- Give the view a name and enter the SQL definition.
- Click File > Save
- Right-click the project and select Build again just to make sure everything is as expected.
- As this is a shared Synapse workspace, we should create a branch version of the Serverless SQL Pools database and publish the branch to that database. On the Serverless SQL Pools instance in Synapse, create a new database
CREATE DATABASE SQLWebTelemetryAnalysis_Branch
- Back in ADS, right-click the project and select Publish
- Select Connections and ensure the new branch database has been selected
- Click Publish
- The publishing screen should now show the progress of the dacpac publish and will show Deploy Dacpac Succeeded
- Switch to the Source Control area and Commit and Sync these changes to the branch
Create Pull Request
- In Synapse studio, select the source control drop-down and select Create pull request
- DevOps will open and you should see any Synapse artifacts that have been added/amended in the Files section as well as the database changes made from ADS.
- Ensure that the branch is the source and main is the target of the PR.
- Click Create for the PR and then Complete the merge
- Back in Synapse studio, select the main branch from the drop-down and ensure the changes are there
- Click Publish to push the changes to the workspace Synapse branch (if you want to schedule/trigger the pipeline)
Deploy Latest Database Changes
- Back in ADS, open the Source Control area
- Click the ellipsis again and select Checkout to…
- You should see the main branch, select this.
- Then select the ellipsis again and select Pull, this gets the changes merged earlier
- We can now right-click the database project and publish to the original Serverless SQL Pools database
Conclusion
So we can now align a source control strategy using the out-of-the-box Synapse integration with Azure DevOps/Git and add in the new Serverless SQL Pools database projects from Azure Data Studio. We can branch, development and merge back in. Of course how you deploy to other environents (UAT/Live) is up to you, as long as both the Synapse artifacts and Database changes are included.
References
- Synapse implementation success methodology: Evaluate solution development environment design – Azure Synapse Analytics | Microsoft Learn
- Set Git branch security and permissions – Azure Repos | Microsoft Learn
- Source control in Synapse Studio – Azure Synapse Analytics | Microsoft Learn
- Download and install Azure Data Studio – Azure Data Studio | Microsoft Learn
- Getting started with the SQL Database Projects extension – Azure Data Studio | Microsoft Learn