SQL or M? – SSAS Partitions in Power Query/M

July 15, 2018July 15, 2018

In the data platform industry, we have been working with SQL for decades. It’s a powerful language and over many years, we’ve learned to work with it’s strengths and to understand and work around it’s idiosyncrasies. M is a considerably more modern and flexible query language. Some best practices have evolved but many are still learning the basic patterns of effective query design. Reaching that stage with a technology often takes years of trial-and-error design and a community willing to share their learnings. I will continue to share mine and appreciate so many in the community who share theirs.

Why Use M Instead of SQL?

For database professionals using SQL Server as the sole source of data for an SSAS or Power BI data model, there is a solid argument to be made in favor of encapsulating the query logic in database objects. DBAs need to manage access to important databases. A comment posted in an earlier post on this topic mentioned that SQL Server views can implement schema binding – which doesn’t allow a table or any other dependent object to be altered in such a way that it would break the view. This is a good design pattern that should be followed if you are the database owner, have the necessary permission and flexibility to manage database objects as part of your BI solution design. Ultimately, this is an organizational decision. If the BI solution developer is not the DBA, you may have limited options. If you don’t have control over the source database objects, if you are not using SQL Server or otherwise prefer to manage everything in the SSAS or Power BI project, Power Query is probably the right place to manage all the query logic.

In my earlier post, I used a table-valued user-defined function to manage the partition filtering logic in SQL Server. Rather than using SQL and database objects, we’ll use Power Query alone. The working M script is shown below.

For brevity, I’m starting by showing the solution but I will show you the steps we went through to get there a bit later.

Updating the Partition Definitions

The three partitions defined in the earlier example are replaced using the following M script, which returns exactly the same columns and rows as before.

Here is the M script for each partition:

“This week” partition:

let
Source = #”SQL/localhost;ContosoDW”,
SalesData = Source{[Schema=”dbo”,Item=”FactSalesCompleteDates”]}[Data],
#”Filtered Rows” =
Table.SelectRows( SalesData, each
[DateKey] >=
Date.StartOfWeek( DateTime.FixedLocalNow() )
)

in
#”Filtered Rows”

“This month before this week” partition:

let
Source = #”SQL/localhost;ContosoDW”,
SalesData = Source{[Schema=”dbo”,Item=”FactSalesCompleteDates”]}[Data],
#”Filtered Rows” =
Table.SelectRows( SalesData, each
[DateKey] >=
Date.StartOfMonth( DateTime.FixedLocalNow() )
and
[DateKey] <
Date.StartOfWeek( DateTime.FixedLocalNow() )
)

in
#”Filtered Rows”

“Before this month” partition:

let
Source = #”SQL/localhost;ContosoDW”,
SalesData = Source{[Schema=”dbo”,Item=”FactSalesCompleteDates”]}[Data],
#”Filtered Rows” =
Table.SelectRows( SalesData, each
[DateKey] <
Date.StartOfMonth( DateTime.FixedLocalNow() )
)

in
#”Filtered Rows”

The Power Query Litmus Test: Query Folding

When connected to an enterprise data source like SQL Server, Power Query should be able to pass an important test. Use the Design… button to view the Power Query Editor. Select the last query step and then right-click to show the menu.

If the View Native Query menu option is enabled, you are good. This means the the query is being folded – and that’s a good thing. Query folding converts the query steps into a native query for the database engine to execute. This is the resulting T-SQL query script generated by Power Query:

select [_].[SalesKey],
[_].[DateKey],
[_].[channelKey],
[_].[StoreKey],
[_].[ProductKey],
[_].[PromotionKey],
[_].[CurrencyKey],
[_].[UnitCost],
[_].[UnitPrice],
[_].[SalesQuantity],
[_].[ReturnQuantity],
[_].[ReturnAmount],
[_].[DiscountQuantity],
[_].[DiscountAmount],
[_].[TotalCost],
[_].[SalesAmount],
[_].[ETLLoadID],
[_].[LoadDate],
[_].[UpdateDate]

from [dbo].[FactSalesCompleteDates] as [_]

where [_].[DateKey] >= convert(datetime2, ‘2018-07-01 00:00:00’) and [_].[DateKey] < convert(datetime2, ‘2018-07-15 00:00:00’)

You don’t need to do anything with this information. It’s just good to know. End of story.

And Now… The Rest of The Story

Power Query is an awesome tool that does some amazingly smart things with the simple data transformation steps you create in the designer. However, it is important to make sure Power Query produces efficient queries. During the development of this solution, I created an early prototype that didn’t produce a query that would fold into T-SQL. Thanks to Brian Grant, who is an absolute genius with Power Query and M, for figuring this out (BTW, you can visit Brian’s YouTube tutorial collection here).

In my original design which I prototyped in Power BI Desktop, I thought it would make sense to create custom columns for each of the date parts needed to filter the partitions. Here’s the prototype query for the query I originally named “This Month Thru Last Week”:

let
Source = FactSales,

#”Add DateTimeNow” = Table.AddColumn(Source, “DateTimeNow”, each DateTime.LocalNow()),
#”Change Type DateTime” = Table.TransformColumnTypes(#”Add DateTimeNow”,{{“DateTimeNow”, type datetime}}),
#”Add StartOfThisWeek” = Table.AddColumn(#”Change Type DateTime”, “StartOfThisWeek”, each Date.StartOfWeek([DateTimeNow]), type date),
#”Add StartOfThisMonth” = Table.AddColumn(#”Add StartOfThisWeek”, “StartOfThisMonth”, each Date.StartOfMonth([DateTimeNow]), type date),
#”Add StartOfPreviousMonth” = Table.AddColumn(#”Add StartOfThisMonth”, “StartOfPreviousMonth”, each Date.StartOfMonth(Date.AddMonths([DateTimeNow], -1)), type date),
#”Partition Filter” = Table.SelectRows(#”Add StartOfPreviousMonth”, each ([DateKey] >= [StartOfThisMonth] and [DateKey] < [StartOfThisWeek]) ),
#”Removed Columns” = Table.RemoveColumns(#”Partition Filter”,{“ETLLoadID”, “LoadDate”, “UpdateDate”, “DateTimeNow”, “StartOfThisWeek”, “StartOfThisMonth”, “StartOfPreviousMonth”})

in
#”Removed Columns”

As you can see, I created separate columns using Transform menu options based on the current date and time, stored in a custom column named “DateTimeNow”:

StartOfThisWeek
StartOfThisMonth
StartOfPreviousMonth

The rest was simple, I just added filters using these columns and then removed the custom columns from the query in the last step. All good with one small exception… it didn’t work. We learned that Power Query can’t use custom column values to build a foldable filter expression. The filters just won’t translate into a T-SQL WHERE clause.

Checking the last query step with a right-click shows that the “View Native Query” menu option is grayed-out so No Folding For You!

Simple lesson: When query folding doesn’t work, do something else. In this case, we just had to put the date comparison logic in the filter steps and not in custom columns.

Paul Turley

Microsoft Data Platform MVP, Principal Consultant for 3Cloud Solutions Specializing in Business Intelligence, SQL Server solutions, Power BI, Analysis Services & Reporting Services.

Power BI Direct Lake and DirectQuery in the Age of Fabric

I just returned from the Microsoft Fabric Community Conference in Las Vegas. Over 4,000 attendees saw a lot of demos showing how to effortlessly build a modern data platform with petabytes of data in One Lake, and then ask CoPilot to generate beautiful Power BI reports from semantic models that magically appear from data in a Fabric Lakehouse. Is Direct Lake the silver bullet solution that will finally deliver incredibly fast analytic reporting over huge volumes of data in any form, in real time? Will Direct Lake models replace Import model and solve the dreaded DirectQuery mode performance problems of the past? The answer is No, but Direct Lake can break some barriers. This post is a continuation of my previous post titled “Moving from Power BI to Microsoft fabric”.

Direct Lake is a new semantic model storage mode introduced in Microsoft Fabric, available to enterprise customers using Power BI Premium and Fabric capacities. It is an extension of the Analysis Services Vertipaq in-memory analytic engine that reads data directly from the Delta-parquet structured storage files in a Fabric lakehouse or warehouse.

Moving from Power BI to Microsoft Fabric

Fabric is here but what does that mean if you are using Power BI? What do you need to know and what, if anything will you need to change if you are a Power BI report designer, developer or BI solution architect? What parts of Fabric should you use now and how do you plan for the near-term future? As I write this in March of 2024, I’m at the Microsoft MVP Summit at the Microsoft campus in Redmond, Washington this week learning about what the product teams will be working on over the next year or so. Fabric is center stage in every conversation and session. To say that Fabric has moved my cheese would be a gross understatement. I’ve been working with data and reporting solutions for about 30 years and have seen many products come and go. Everything I knew about working with databases, data warehouses, transforming and reporting on data has changed recently BUT it doesn’t mean that everyone using Power BI must stop what they are doing and adapt to these changes. The core product is unchanged. Power BI still works as it always has.

The introduction of Microsoft Fabric in various preview releases over the past two years have immersed me into the world of Spark, Python, parquet-Delta storage, lakehouses and medallion data warehouse architectures. These technologies, significantly different from the SQL Server suite of products I’ve known and loved for the past twenty years, represent a major shift in direction, forming the backbone of OneLake; Microsoft’s universal integrated data platform that hosts all the components comprising Fabric. They built all of Fabric on top of the existing Power BI service, so all of the data workloads live inside familiar workspaces, accessible through the Power BI web-based portal (now called the Fabric portal).

CI/CD & DevOps for Power BI… Are We There Yet?

In my view, projects and teams of different sizes have different needs. I described DevOps maturity as a pyramid, where most projects don’t require a sophisticated DevOps implementation, and the most complex solutions do. The DevOps maturity is a progression, but only for projects of a certain scale. One of the following options might simply be the best fit for a particular project.
Unless you are throwing together a simple Power BI report that you don’t plan to maintain and add features to, the first and most basic managed project should start with a PBIX file or Power BI Project folder stored in a shared and cloud-backed storage location.
DevOps isn’t a requirement for all projects, but version control and shared file storage definitely is.

SQL or M? – SSAS Partitions in Power Query/M

Why Use M Instead of SQL?

Updating the Partition Definitions

The Power Query Litmus Test: Query Folding

And Now… The Rest of The Story

Like this:

Paul Turley

4 thoughts on “SQL or M? – SSAS Partitions in Power Query/M”

Leave a ReplyCancel reply

Why Use M Instead of SQL?

Updating the Partition Definitions

The Power Query Litmus Test: Query Folding

And Now… The Rest of The Story

Share this:

Like this:

Paul Turley

Related Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

4 thoughts on “SQL or M? – SSAS Partitions in Power Query/M”

Leave a ReplyCancel reply

Discover more from Paul Turley's SQL Server BI Blog