In this post I show one way you can populate a Microsoft Fabric Lakehouse in different Microsoft Fabric deployment pipeline stages.
I want to cover this topic since it is a common question. In fact, just recently somebody asked about me about this at Data Saturday Holland.
By the end of this post, you will know one way to populate a Microsoft Fabric Lakehouse in different deployment pipeline stages. Which I show by orchestrating the deployment in Azure Pipelines within Azure DevOps.
Like in the below release pipeline. Which contains an additional stage which runs a notebook to populate a Lakehouse.
Plus, in this post I show one way you can copy a csv file stored in a GitHub repository into a Lakehouse.
I must stress that the example in this post is a basic scenario. In order to show you how to populate a Microsoft Fabric Lakehouse in different deployment stages.
Because my main aim of this post is to highlight how you can populate an empty Lakehouse and inspire you to be creative in your solutions.
Please read
Please note, a lot of the API calls in this post currently do not support service principals. Therefore, in this post I often authenticate with a Microsoft Entra ID user account that has multifactor authentication (MFA) disabled.
I strongly recommend working with this in a test environment until service principals are supported. Unless working with a regular account with multifactor authentication disabled is acceptable in your production. Which is unlikely.
Source of examples in this post to populate a Microsoft Fabric Lakehouse
For this post I worked with the below Microsoft Fabric deployment pipeline. Which is connected to three workspaces. All of which represent three different deployment stages.
In addition, I cloned an existing release pipeline in Azure DevOps that I showed in a previous post. Where I showed how you can utilize the DeploymentPipelines-DeployAll PowerShell script and Azure DevOps to update Microsoft Fabric Deployment Pipeline stages.
Plus, I adapted the code that I showed in a previous post that covered how to run a Microsoft Fabric notebook from Azure DevOps for my release pipeline. In order to run a notebook once it has been deployed to a new stage by API calls.
For the data source, I used the ‘sales.csv’ file that is available in the dp-data GitHub repository provided by Microsoft Learn. Which is the csv file that is referenced in the exercise for the “Get started with lakehouses in Microsoft Fabric” Microsoft Learn module.
Changing authentication method to update different deployment pipeline stages
First thing I did was to change my release pipeline in Azure Pipelines. So that my tasks that called the deployment pipelines Fabric REST APIs authenticated with a Microsoft Entra ID account with MFA disabled instead of a service principal.
I had to do this because at this moment in time service principals only support Power BI items when working with the deployment pipelines Fabric REST APIs. However, I needed to deploy Lakehouses and notebooks as well.
Which meant that I had to change the below code.
$SecureStringPwd = ConvertTo-SecureString $(servicePrincipalKey) -AsPlainText -Force
$pscredential = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList $(servicePrincipalId), $SecureStringPwd
Connect-AzAccount -ServicePrincipal -Credential $pscredential -Tenant $(tenantId)
To the below code.
$SecureStringPwd = ConvertTo-SecureString '$(userpw)' -AsPlainText -Force
$pscredential = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList $(user), $SecureStringPwd
Connect-AzAccount -Credential $pscredential -Tenant $(tenantId)
In addition to this, I granted the user account permissions to the deployment pipeline and associated workspaces. I then ran the updated release pipeline to check that it worked okay.
Adding the Lakehouse and notebook to the deployment pipeline
I then created a new Lakehouse and notebook to introduce to the workspace that represented the Development stage of my Microsoft Fabric deployment pipeline.
Once deployed, I started adding code to my notebook.
In order for this to work properly I first had to configure the default Lakehouse in the first cell.
%%configure
{
"defaultLakehouse": {
"name": "GIDemoLH"
}
}
I then ran the below code in the notebook to download the dp-data git repository into my existing Spark session.
!git clone https://github.com/MicrosoftLearning/dp-data.git
Once the repository had been cloned, I ran the below code. In order to copy the contents of the csv file into a pandas dataframe.
import pandas as pd
salesdf = pd.read_csv('dp-data/sales.csv')
I then wrote the contents of the pandas dataframe back to a csv file in the Files location within the default Lakehouse. Which highlights why it is a good idea to configure a default Lakehouse.
import pandas as pd
salesdf.to_csv("/lakehouse/default/Files/sales.csv")
I then ran the code and confirmed that the csv file appeared in the Files location in the Lakehouse.
Populating the Microsoft Fabric Lakehouse in the Test deployment stage
After creating everything in the workspace that represented the development stage, I returned to Azure DevOps to update my release pipeline . By adding an additional stage to populate the Lakehouse in the test workspace.
I added the below three tasks to the new stage. Which utilizes the same logic I covered in my post that covers how to run a Microsoft Fabric notebook from Azure DevOps. In order to run the notebook and add the csv file to the Lakehouse in the Test workspace.
You can view the contents of that post to view the full code. However, I must add that I changed the variable settings for the notebook name and the workspace id. Because I needed to add the id for the workspace that represents the Test stage.
Plus, I changed the Advanced settings of my PowerShell tasks to use PowerShell core. In order for the -ResponseHeadersVariable parameter to work with the Invoke-RestMethod PowerShell cmdlet.
Afterwards, I ran the pipeline with the additional stage, and it completed successfully. As you can see below.
I then went into the workspace that represented my Test stage.
First, I confirmed that the Lakehouse and notebook had both been deployed. I then went into the Lakehouse in my Test stage and confirmed that the csv file was there.
Final words
I hope showing one way you can populate a Microsoft Fabric Lakehouse in different deployment pipeline stages makes for an interesting read.
Because I want others to realize what you can do to populate the empty Lakehouses that are deployed as part of your lifecycle management within Microsoft Fabric. Even more so once service principal support finally happens.
In addition, I want to inspire others. Because the reality is that you can be a lot more creative in your final solutions. Including more complex code with multiple sources or even Data Pipelines.
Of course, if you have any comments or queries about this post feel free to reach out to me.
Be First to Comment