I want to cover one way you can do CI/CD for Azure Synapse Analytics serverless SQL pools using Azure DevOps in this post. Because I know it is a popular topic.
It is related to my post about how you can create a dacpac for an Azure Synapse Analytics dedicated SQL pool using Azure DevOps. Since they are both based in the same service.
Plus, a while ago I wrote about the increase in demand for Data Platform automation. So, I really wanted to do a post about how you can do CI/CD for Azure Synapse Analytics serverless SQL pools.
By the end of this post, you will know one way you can achieve this. In addition, some open-source tools that can help SQL Server and Oracle professionals do migration-based deployments.
CI/CD for serverless SQL pools video
You can see the contents of this post in action in the video for the November 2022 edition of the Azure Synapse Analytics and MVP series.
It is worth noting that there are other solutions available to do this. Which are either open-source or come at a cost because they provide more functionality.
Azure DevOps template available online
Before I go into too much detail, I have made a public repository available in GitHub that you can use as a template to do this yourself in Azure DevOps. It is called AzureDevOps-SynapseServerlessSQLPool.
You can access it by clicking on the name for it. There are also resources to help inside that repository. Including comments and a populated wiki that contains details on how to do various things. Like how to use Azure Key Vault to store variables.
One key point to remember is that when you create the password variable you must surround the value with single quotes. For example, for the password TopSecret you would enter ‘TopSecret’ with the quotes either side of it.
There are various ways to manage the variables in the AzureDevOps-SynapseServerlessSQLPool repository. Including variables containing secrets from Azure Key Vault.
For ideas about managing these variables I recommend reading one of my other post which covers how to keep your Azure Synapse secrets secret in Azure DevOps.
If it proves useful for you, please give it a star in GitHub.
First success with serverless SQL pool
It took a while to get to an efficient process that satisfied me. My first success in getting CI/CD to work with a serverless SQL pool was by using DbUp.
DbUp is a .NET library that you can use to do migration-based deployments. It is open-source and is licensed under the MIT license, which you can read about in the DbUp license file.
According to the official list of supported databases, it allows you to do migration-based to various databases like SQL Server and MySQL. However, I discovered it can also work with an Azure Synapse Analytics serverless SQL pool.
I had to do a few things to first get it to work in Visual Studio.
First, I discovered that the create database option does not work with serverless pools. So, I had to create the database first.
Afterwards, I discovered that I had to change the code so that it would not attempt to create a table called SchemaVersions. Which is used to track which scripts have been run. Because I was unable to create the table in a separate Azure SQL Database.
Idea is that when you want to make a schema change you add a new script to the above project. However, because there is no way to track when the scripts were run against a database DbUp will run all the scripts again every time.
So, if you are going to use DbUp or the following method in Azure DevOps my advice is to put defensive logic in your SQL code. Like in the below sample taken from the ‘Script0002 – Alter table.sql’ file that I used in the above Visual Studio project. Which I have shortened on purpose.
IF EXISTS (SELECT * FROM sys.external_tables WHERE name = 'nyc_tlc_yellow_trip_ext')
DROP EXTERNAL TABLE nyc_tlc_yellow_trip_ext
GO
CREATE EXTERNAL TABLE nyc_tlc_yellow_trip_ext (
[vendorID] varchar(8000),
[tpepPickupDateTime] datetime2(7)
)
WITH (
LOCATION = 'yellow/puYear=2014/puMonth=3/*.parquet',
DATA_SOURCE = [nyctlc_azureopendatastorage_blob_core_windows_net],
FILE_FORMAT = [SynapseParquetFormat]
)
GO
CI/CD for serverless SQL pools using Azure DevOps
Moving on from there I wanted to do CI/CD for serverless SQL pools in both Azure DevOps and GitHub.
I first got this running in a pipeline by building the package using the ‘dotnet build’ command and then publishing it as an artifact. It worked; however, I thought this way was not as efficient as it could be.
So, I looked to see there was some PowerShell that could help. Whilst searching using my favourite search engine I spotted DBOps. I remembered that my friend Sander Stad (l/t) had mentioned it in a previous conversation.
So, I thought I would check it out. It is a GitHub repository that is in the Data Platform Community organization (previously known as the sqlcollaborative organization). Which was created by Kirill Kravtsov (l/t). Just like DbUp it is open-source and licensed under the MIT license. In fact, it uses DbUp for its deployment functionality.
Officially it supports SQL Server, Oracle, PostgreSQL and MySQL relational databases. However, I was able to get it to work with an Azure Synapse serverless SQL pool locally. So, I tried it in Azure DevOps.
I created a repository in Azure DevOps that contained only four files. A readme file, a yaml file for the pipeline and the two sample scripts. As you can see below it worked fine, with only those four files.
SchemaVersions table
One thing to note is that I am unable to change the location of the SchemaVersions table to be somewhere outside of the database. As I mentioned earlier in this post, that is the table DbUp uses to track which scripts have already been executed in a particular database.
It would be better to point it to a table in an Azure SQL Database or a dedicated SQL Pool instead. Because the table uses an identity column which is not supported in serverless SQL pools.
So, for now I am I am using the ‘-SchemaVersionTable $null’ syntax at the end of the PowerShell command. I have raised this as an issue. However, I have also figured out a potential workaround for it below.
You can try and work around the journal issue by creating a file in Azure Data Lake Storage that contains the headings that are mentioned in the DbUp Journaling section. From there, you can look to add it as an external table within the database you want to use in your serverless SQL pool. From there, you can reference the external table in your code.
Testing CI/CD for serverless SQL pools with two workspaces
To check for consistency, I tested deploying the same two scripts to two separate serverless SQL Pools in two different workspaces.
For the first Azure DevOps example in this post, I put all the tasks in one job. Because I used a Microsoft-Hosted agent to deploy the package. So, I had to install the DBOps PowerShell module first before running the command.
However, for the second example I used a self-hosted agent with the DBOps module already installed. Which gives you more flexibility.
As you can see below, it made a big difference to the runtime of the pipeline as well.
Update about CI/CD for serverless SQL pools demos
On Friday October 29, myself and Sander Stad are presented at the Azure Community Conference. Where we did an exclusive demo relating to this post.
I have since done a demo of this at the Data Toboggan conference as well. You can click on the image below or on the link to watch the ‘KevOps for Azure Synapse Analytics‘ session.
Final words
I hope this post about CI/CD for Azure Synapse Analytics serverless SQL pools using Azure DevOps has given some of you inspiration. Because I am aware that it is an issue for a few people.
In addition, I hope it has introduced the DBOps module to some of you looking to implement migration-based deployments for other types of databases. Like Oracle and SQL Server.
Of course, if you have any comments or queries about this post feel free to reach out to me.
[…] Kevin Chant doesn’t have time for manual deployments: […]
Hi Kevin, I like your approach. I’ve taken a slightly different one when I started trying out something similar myself last week. I’m still considering dedicating my first ever blog post to it ^^
I’ve created a database project with nothing but defensive post deployment scripts. Currently, mine contains create (if not exists) scripts for:
– database
– master key
– schemas
– logins
– users
– database scoped credentials
– external data sources
– stored procedure
– views
By building the project, I end up with one single postdeploy.sql.
So, in my DevOps pipeline, I:
– build the project
– copy the dacpac
– rename it to .zip, unzip it, and take the postdeploy.sql
– replace all secret placeholders with KeyVault secrets using the Replace Tokens extension
– execute the postdeploy.sql using sqlcmd in Bash.
So far, it works like a charm 🙂
The downside is that I need to do proper testing on an existing pool, because building the project will never give me any errors. Then again, that’s no different than any other postdeploy script.
Interesting workaround, you can use a SQL Server database to do a basic level of unit testing using tSQLt during the early stages of a pipeline. It looks like you can avoid converting the project to an Azure SQL Datawarehouse in the dacpac as well since you are extracting the sql files within it.
However, like you said for a more thorough level of testing you will need a database inside an existing pool.
I chose to build the project because this approach is similar to the pipelines we already have in place for dedicated sql pools. I have all the scripts that I mentioned above as separate scripts with BuildAction None, just for the sake of having some order in my project. And one file with BuildAction PostDeploy in which I call all separate files. The result after building is one file with a fixed name (postdeploy.sql) that I can easily use as input for my sqlcmd statement. Otherwise, I would have to use something like PowerShell to loop over the folder, collect all .sql files, put those into a variable and use that as input. My PS skills are a bit shaky, hence my other approach 🙂
This procedure sounds very familiar to me.
[…] a start, in a previous post I wrote about how you can CI/CD for serverless SQL pools using Azure DevOps. So, I thought I would balance things out and show how you can do the same thing within […]
[…] a start, in a previous post I wrote about how you can CI/CD for serverless SQL pools using Azure DevOps. So, I thought I would balance things out and show how you can do the same thing within […]
[…] about how to do CI/CD for serverless SQL pools using GitHub Actions. I also did a post about how to CI/CD for serverless SQL Pools using Azure DevOps as […]
[…] You can then uses them as a source for Azure Pipelines to do various deployments. For example, CI/CD for serverless SQL pools. […]
[…] preparing for this session. Along the way I realized I had to update my post about how to do CI/CD for serverless SQL pools using Azure DevOps. To include the fact that the password variable must be surrounded by single […]
[…] SQLBits I showed the same method that I covered in a previous post. Where I covered how to perform CI/CD for serverless SQL pools using Azure DevOps. Which uses the DBOps PowerShell […]
[…] Please note – This blog was originally published on my personal blog here. […]
[…] Please note that this post applies to dedicated SQL Pools. To learn one way you can do CI/CD for serverless SQL Pools read my other post on how to do CI/CD for serverless SQL pools using Azure DevOps. […]
[…] Plus, coming up with solutions to help others in the Data Platform community. Like how to perform CI/CD for serverless SQL Pools. […]
[…] Over time I decided to add a Wiki to some of the more popular repositories. To provide more details about how to use them. Which you can see in the Wiki section of the GitHub repository that contains a template to perform CI/CD for serverless SQL Pools using Azure DevOps. […]
Hello Kevin,
Much appreciated for writing this blogpost. Instead of using variables to store the credentials, we dumped the used SPN credentials to retrieve the passwords in keyvault using PS.
Thanks!
[…] You can read more about that in my blog post on how to perform CI/CD for serverless SQL pools using Azure DevOps. […]
[…] Which includes this video for Festive Tech Calendar and another one for last months edition of the Azure Synapse Analytics and Microsoft MVP series. Where I showed how to perform CI/CD for serverless SQL Pools. […]
[…] first post relating to this covered performing CI/CD for serverless SQL pools using Azure DevOps. This is the first post where I go into detail about using the dbops PowerShell module with Azure […]
[…] If you prefer For migration-based deployments you can still use the method I covered in my post about CI/CD for serverless SQL Pools. […]
[…] a more complete version of implementing CI/CD for your serverless SQL pool, you can check out this blog by Kevin Chant, in which he uses a .NET library called […]