Microsoft Azure & Snowflake Big Data Solution

Himanshu Negi
4 min readDec 25, 2020

Big data refers to data that would typically be too expensive to store, manage, and analyze using traditional database systems and that is why organizations are moving towards cloud computing. Cloud computing offers access to data storage, processing, and analytics on a more scalable, flexible, cost-effective, and even secure basis than can be achieved with an on-premises deployment. Instead of investing heavily in data centers or servers before you know how you’re going to use them, you can pay only when you consume the resources, and pay only for how much you consume basically a pay-as-you-go model. One of the biggest cloud platforms is Microsoft Azure and this article is a quick start guide to build a platform for storage, processing, and analytics using some of the SAAS services provided by Microsoft. I Will be using Azure Data Lake Gen 2 which is a highly scalable storage system, Azure Databricks which runs on a spark engine to process your big data, and finally Azure Synapse Analytics & Snowflake(hosted on Azure) for your enterprise data warehousing.

Let’s start with creating a storage account in azure. If you don’t have an Azure account you can create a free tier account from here. On the Azure portal menu, select All service and select storage account.

Navigate to the newly created storage account look for blob services and then create a container.

Before sending data to the container you would need an access key. Go to the storage account look for the access key then show keys and copy the Connection Strings from either of the keys

Let’s send some data to our container. I would be running a python code to send data from Kafka to Azure Data Lake. You can find the code here. If you want to setup Kafka you can go to this article.

A new file is created. Now let create a synapse workspace.

Now go to the workspace and open synapse studio. From the left click manage and create SQL pools.

After a few minutes, you should have a SQL pool created. Now go to Develop →SQL script then create a table. Also, you would need to create the master key.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '23987hxJ#KLt6874nl0zBe';

Now we will create an Azure Databricks workspace and launch the workspace

Now open a new notebook and read data in the spark data frame from the azure data lake

Now, let’s write this data to the SQL table. You can find the complete code here.

And now let’s push the data to Snowflake Datawarehouse as well. You can create a trial account from here. I am using the snowflake account which is running on azure. First, create a warehouse.

Now, create a database, schema, and table.

Let’s load some data to the snowflake table using Databricks.

Now query the data using snowflake’s worksheet. First select the warehouse, database, and schema.

That’s it folks I hope this was helpfull.

--

--