Thinking Behind Use of KEEPFILTERS

BrianJ · September 15, 2020, 11:33pm

I didn’t want to hijack this thread by asking the question there, but I’ve noticed in your solutions that you use KEEPFILTERS probably an order of magnitude more frequently than the rest of us combined. I was wondering if you could expand on your thinking as to when you choose to use this function?

Thanks!

Brian

P.S. Really elegant solution on this one.

AntrikshSharma · September 16, 2020, 5:46am

Thanks!!

There are few straight forward reasons before I explain why I use it so often.

KEEPFILTERS creates a SET intersection between what is written in the code and what is available in filter context outside CALCULATE: slicers, rows, columns
Makes it easier to write predicate/boolean conditions without overwriting the existing filter context. The end result is more readable and elegant looking
By using KF you are able generate more efficient queries with column filters, since now you don’t have to iterate a full dimension or fact table
Writing predicate ensures you only get unique existing combinations and KEEPFILTERS ensures that filters inside CALCULATE and outside CALCULATE intersect

Predicate statement = KEEPFILTERS ( Products[Color] IN { "Red", "Green", "Blue" } )

Non predicate equivalent =
FILTER ( Products, Products[Color] IN { "Red", "Green", "Blue" } )

Not used very commonly but you can use KEEPFILTERS with iterators too, in that case it creates an intersection between context transition and the existing filters.

Measure = SUMX ( KEEPFILTERS ( ALL ( Products[Color] ) ), [Total Sales] )

I am going to use Contoso dataset with 12.5 Million rows for the demonstration.

Let’s say you want to create a report showing sales only for trendy colors otherwise blank.

you would want to write the measure in the following way so that you want the sales of the colors that are trendy plus included in the slicer

Trendy colors =
CALCULATE (
    [Total Sales],
    FILTER ( Products, Products[Color] IN { "Red", "Green", "Blue" } )
)

So far everything is fine no issues. Now lets see the query generated by this measure.

Pay attention to the number of Rows this measure had to iterate, because we used a full table inside CALCULATE, a full scan is also done to retrieve the values.

If on the other hand I modify the measure a little bit by introducing KEEPFILTERS, look at the query generated and the result is same too!

Moving on to a more complex example.
Now we are trying to calculate sales amount where quantity * net price is greater than 1000.

Some might write the code like this:

Sales Amount GT 1000 = 
CALCULATE(
    [Total Sales],
    FILTER ( 
        Sales,
        Sales[Quantity] * Sales[Net Price] > 1000
    )
)

this works fine and you can interact with slicer and obtain the result depending on the quantity you select.

and same can be done with the following, look at the ALL statement it will contain unique combination of Quantity and Net price and once the product is greater that 1000 only the values of these 2 columns would be applied to the filter context.

In case of full table the all the columns of the sales would be applied to the filter context and that could be very expensive in case there are a lot of columns, and to be honest I don’t think you would need every column of a table to get the result. And the number of rows applied to the filter context are huge too!

Sales Amount GT 1000 KF =
CALCULATE (
    [Total Sales],
    KEEPFILTERS (
        FILTER (
            ALL ( Sales[Quantity], Sales[Net Price] ),
            Sales[Quantity] * Sales[Net Price] > 1000
        )
    )
)

Let’s pay attention to the queries generated by these 2

without KF query:

with KF query:

by now you can see how many rows the SE engine has to bring back to get the desired result.

Another example:

Let’s say you are slicing trendy colors by brands.

Measures used -

Trendy colors =
DIVIDE (
    CALCULATE (
        [Total Sales],
        FILTER ( Products, Products[Color] IN { "Red", "Green", "Blue" } )
    ),
    CALCULATE ( [Total Sales], ALL ( Sales ) )
)

Trendy Color without KF =
DIVIDE (
    CALCULATE ( [Total Sales], Products[Color] IN { "Red", "Green", "Blue" } ),
    CALCULATE ( [Total Sales], ALL ( Sales ) )
)

Trendy Color with KF =
DIVIDE (
    CALCULATE (
        [Total Sales],
        KEEPFILTERS ( Products[Color] IN { "Red", "Green", "Blue" } )
    ),
    CALCULATE ( [Total Sales], ALL ( Sales ) )
)

You can see all 3 measures return the same result, but the difference is evident when you slice by colors

That’s why I use KEEPFILTERS more often as it helps in creating elegant and efficient code But knowing when to use it is absolutely necessary.

Here is the snapshot of the data model, don’t hate me for snow flake schema that’s only for learning purpose

Harsh · September 16, 2020, 5:54am

@AntrikshSharma,

Superb explanation. Really found it helpful.

Absolute kudos to you.!!!

Thanks and Warm Regards,
Harsh

AntrikshSharma · September 16, 2020, 6:19am

Thanks Harsh!!!

BrianJ · September 16, 2020, 6:43am

@AntrikshSharma,

Magnificent explanation. Thanks so much for taking the time to walk through that in such detail, with such great examples.

The depth and breadth of your DAX knowledge is quite remarkable.

I learn something new on this forum every single day, but today even more than usual.

Well done, and thanks again!

Brian

Hemantsingh · September 16, 2020, 7:24am

Hi @AntrikshSharma,

Indeed a great Explanation. I too wanted to ask the same question as why do you often use keepfilters. After reading your explanation now i understand what makes your codes run faster.

Keep shining ,keep daxing…Cheers.

Regards,
Hemant

BrianJ · September 16, 2020, 7:31am

@sam.mckay and @EnterpriseDNA,

I think @AntrikshSharma’s analysis would make a fantastic entry on the Enterprise DNA blog.

Brian

AntrikshSharma · September 16, 2020, 7:41am

Thanks a lot!! @Hemantsingh and @BrianJ

Hemantsingh · September 16, 2020, 7:45am

@AntrikshSharma

Can you provide the link to contoso dataset that has 1.2 billion rows. One that i have only has 2 million. I am looking for a big dataset for performance testing of measures. Think your dataset for contoso can really help.

Regards,
Hemant

AntrikshSharma · September 16, 2020, 7:51am

Oh I totally forgot to edit that I generally count in the Indian way (crores) and then convert it to millions and billions, I made a mistake earlier while writing but didn’t edit it.

This database is the one provide by Microsoft and modified by SQLBI, if you have the DAX book from SQLBI there is a link to this dataset in the book.

The total rows are 12.5 Millions.

Hemantsingh · September 16, 2020, 7:59am

I have first version (PDF) of definitive guide, i guess it was published in 2016. Is that link in this book too??

M_K · September 16, 2020, 8:04am

Hi @AntrikshSharma
Thank you for taking the time to provide a detailed explanation.
I did find using KP more readable than FILTER (your response to my query on a different thread). When I applied your suggested measure with KP instead of FILTER on a fairly large data set, I felt the results were reporting slightly quicker (as I changed the values on slicers), but couldn’t understand why. Now I know
Thanks for sharing.

AntrikshSharma · September 16, 2020, 8:11am

Not sure about the 2016 version. It should be under the companion content section.

BrianJ · September 16, 2020, 2:26pm

@Hemantsingh,

Here’s a public dataset of NYC taxi trips (112 million rows), that I’ve seen commonly used for testing and demo purposes. Might be useful for your measure testing…

https://data.cityofnewyork.us/Transportation/2018-Yellow-Taxi-Trip-Data/t29m-gskq

Brian