Recently I wrote about the arrival of the modern data warehouse, per Microsoft Azure design concepts. In this I walked you through an introduction of what this really means, from a high level perspective. This included a walk down memory lane as it pertains to the birth and evolution of the concept we call data warehousing.
With this you were able to then establish a great overview of what is coming with this new approach and paradigm in regards to design and architecture. Its always important to understand where you have been. In doing this, giving correct amount of priority to understanding the past, your able to better anticipate the future. You will then better understand where you are going, in relation to Microsoft’s modern data warehouse architecture and design.
Now that you have established this baseline understanding, it is time to break apart the elements that make this architecture up. When I think in terms of a design for a data warehouse the target in which will house the business friendly semantics and dimensional models is always an initial focus.
In looking then at the modern architecture, Microsoft mostly showcases Azure SQL Data Warehouse instance being populated via PolyBase queries coming from ingested data that lives in an Azure blob storage instance.
I will state here however, that unless you have a data size of 1 terabyte or larger, then the cost and scale needs for an instance of Azure SQL Data Warehouse. Further in regard to the topic of PolyBase, this query technology works great for Azure SQL Data Warehouse. Further you can work with PolyBase with Microsoft SQL Server On-Premise editions with SQL Server 2016 and beyond. As of this writing you can not target and execute PolyBase queries from within the context of an Azure SQL DB. This might be changing with the now in preview, and soon-to-be GA managed instance of Azure SQL.
Ok so lot of options and topics there in those last two paragraphs as we focus on the target in which will house your data warehouse dimensional models, semantics and business data. With that, lets take a breathe together and then let me take you down these options for Azure SQL and what we are seeing in the wild as well.
First lets use an amazing resource that you should for sure completely consume, reference and understand to help clear up options and choices relating to Azure SQL. What I’m referencing is the “What is the Azure SQL Database service?” online documentation.
Recently Microsoft released the new option around having a managed instance of Azure SQL. This brings new parity cloud instance of SQL compared to an On-Premise instance. That stated there are still times in which you would want to select a single DTU based instance, or work with an elastic pool.
Nevertheless, within the SMC / Mid-Market we are seeing that majority of current cloud based data warehouses are less than 500 GB. That means the option of an Azure SQL Data Warehouse instance doesn’t make sense as an option from a cost and scale perspective. For the majority of such deployed instances we are using for customers like you, Azure SQL DB single databases. Specifically staring within the standard service objective tier, and moving into the premium tier as faster I/O is desired / needed for production level instance of a data warehouse in the cloud.
When looking at which option to choose here, the first rule of thumb to direct you is going to be need and desire around having near parity options in your cloud data warehouse; in which you would have for an on-premise implementation. Options like SQL Agent return when you deploy a managed instance vs. a single database. There are some pro’s however to working with a single Azure SQL DB. One that is very much a useful element is the ability to programatically scale an Azure SQL single DB using the ALTER DATABASE statements and setting the SERVICE OBJECTIVE from an S1 to an S4 during peak usage or loads for example.
With a managed instance, such scaling up / down is not as easy actually. You can scale a managed instance through the PowerShell cmdlets that exist for helping manage an instance of Azure SQL Db. Further if you look at the compute power for which a managed instance operates vs. a single db, you setup a managed instance by selecting a vCore option. For the business critical managed instances, you have a RAM aligned per vCore that exist. As of this writing that alignment is 5.5 GB of RAM per vCore. Therefore for an 8 vCore managed instance you end up with 44 GB of dedicated RAM to work with. For a 16 vCore based setup, that translate to 88 GB. Further with business critical option selected, you are then getting access to the faster I/O disk options, that is found similar to the premium tier for a single Azure SQL DB that is based on DTU’s.
For more detailed information I recommend you consume the following resource: What is a Managed Instance (preview)?
Azure SQL Database Managed Instance (preview) is a new deployment model of Azure SQL Database, providing near 100% compatibility with the latest SQL Server on-premises (Enterprise Edition) Database Engine, providing a native virtual network (VNet) implementation that addresses common security concerns, and a business model favorable for on-premises SQL Server customers. Managed Instance allows existing SQL Server customers to lift and shift their on-premises applications to the cloud with minimal application and database changes. At the same time, Managed Instance preserves all PaaS capabilities (automatic patching and version updates, automated backups, high-availability ), that drastically reduces management overhead and TCO.
For this post we now have enough to understand the options that exist in terms and relation to the Microsoft Azure modern data warehouse design. It is really critical to make the best choice in terms of the targeted to-be data warehouse here and why I’ve spent over 1k+ words helping you understand the options for this element.
In the next installment of this topic, I will then move to the ingestion of data itself and options that we have here for consuming and working with source data. Further as we progress along the modern data warehouse design you will see how this can impact the options that we have in what advanced data stories we craft in the end-result of this work. I hope you have enjoyed this series of post thus far and welcome your feedback, comments and questions.
I do have a question from the previous post that I will address between this post and the next in this series. Specifically I was asked about the changes and impact seen from a developer view-point in what the modern data warehouse brings. This of course compared to what traditional data warehouse design offers.
Until next time!
Brandon George | GoPowerBIPro.com | Oppia.co | ArbelaTech.com | Master Data Story Teller