Filtering a measure created in a virtual table


#1

Sample Table.pbix (536.6 KB)
I’ve worked my way through the creation of a virtual table that calculates the distance between Lat/Long coordinates for suburbs and Lat/Long coordinates for stores. When I materialise this virtual table through the “Make Table” function I’m able to obtain a table of Suburb Name, Store Name and Distance… which is great except it’s 4.3 million rows and creates performance issues for the calculations that use the table. What I want to achieve is a means of reducing the size of the table by filtering the Distance column ie. Suburb to Store distances < 10Km.
The Dax code is

   VAR Final_Table =    
    SELECTCOLUMNS( Crossjoin_Table, // crossjoin of Suburb, Lat, Long and Store, Lat,Long 
                   "Sub_Name", 'Sub Heirarchy'[Name],                
                    "Store_Name", 'Retailers Table'[Store Name],
                    "Sub-Store Dist", [Dist] // a calculation of distance between Suburbs and Stores... 4.3M values.
    Return

However, I’ve tried every possible permutation and combination of dax expressions to filter the distance values of the Sub-Store Dist field without success. The major issue being the Field is not available for use in a Calculate or Calculatetable statement and only appears in the following code,

SUMMARIZE(Final_Table,
[Sub_Name],
[Store_Name],
[Sub-Store Dist])

which returns the 4.3M rows. Any suggestions on how I can reference this field in a Calculatetable function so I can reduce the number of rows returned by the table? A sample file containing data and the current Make Table Dax calculation is attached.


#2

Just having a look through.

Any reason why this can’t just be completed in the query editor?

VAR Sub_Table =
    SUMMARIZECOLUMNS (
        'Suburb sample table'[Sub_NAME],
        "Lat1", AVERAGE('Suburb sample table'[Lat1]),
        "Long1", AVERAGE('Suburb sample table'[Long1])
    )
VAR Store_table =
    SUMMARIZECOLUMNS (
        'Store sample table'[Store Name],
        "Lat2", AVERAGE ( 'Store sample table'[Lat2] ),
        "Long2", AVERAGE ( 'Store sample table'[Long2] )
    )
VAR Crossjoin_Table =
    GENERATE ( Sub_Table, Store_table )

My feeling is that is could?

Then if you have this couldn’t you complete all your calculation is DAX measures rather than through this virtual table.

My guess is you could because you would already have the physical table in your model. Then you could just build off that.

The reason it’s so slow in my opinion is the number of iterations the current formula has to complete. This is via the SUMMARIZECOLUMNS, CROSSJOIN then ultimate SUMMARIZE.

What do you think? Could you re-imagine how you do this?

Personally this is how I would do it. Mostly within DAX measures. This is where the most optimization comes through complete a number of calculations.

Let me know what you think.

Thanks


#3

Sam,

The problem is that I filter the table using different distance measures ie. <;2.5, 2.5 – 5, 5 -10. Crossjoining in M is also very slow. I’ve also tried crossjoining in Access and then importing the table, but, import and refresh are slow.

The other issue is the next iteration of the measure is to a lower level than suburb. When I crossjoin mesh blocks with store location the table size increases to 110M rows of data and I’m blowing my PC’s memory capacity in just creating the table.


#4

Sam,
The solution was to filter the table called in the Summarize function and not filtering the field returned by the table. When the code is adjusted as per below;

SUMMARIZE(
FILTER(Final_Table, [Sub-Store Dist] <= 2.5),
[Sub_Name],
[Store_Name],
[Sub-Store Dist])

the number of returned rows is reduced. Performance is still an issue though as we are still filtering a 4.3M row table… which is doable but, it will become a problem when I go to the next level and need to filter the 110M row table.


#5

I’m a big believer this should be done with DAX measures and not within the table calculation.

Measures work way faster than when you run queries like that and I believe will also simplify this here

For example

testing =
CROSSJOIN( VALUES( ‘Store sample table’[Store Name] ), VALUES( ‘Suburb sample table’[Sub_NAME] ) )

All the below are then calculated in measures

image

Then you can just branch out from here with your additional calculations.

This should help in many ways.

First your initial table (CROSSJION one) is much smaller because you don’t have millions of rows of lat long number physical in the table now.

Measures are only called when you use them which is way more efficient.

Also your measures can be simple. Build one layer up at a time and be able to audit your calculations easier at every step.

Also measure are just far more optimize to scan longer tables.

Another way I thought you could reduce the size of you CROSSJOIN table, expecially if there is no reason for the further out regions,

Is to have a think about filter the suburbs for each store.

Like for example here

testing = 
FILTER(
    CROSSJOIN( VALUES('Suburb sample table'[Sub_NAME] ), VALUES( 'Store sample table'[Store Name] ) ),
        CALCULATE( AVERAGE( 'Suburb sample table'[Lat1] ) ) > CALCULATE( AVERAGE( 'Store sample table'[Lat2] ) ) + 2.5 )

This is just an example to have a think about.

Let me know what you think.

I’m big on the DAX part. I don’t believe you should be completing all this calculation inside the query of the table.

Also reducing the initial size of the table will reduce the overall file size as well. You want to avoid have 10s or millions of rows if you can.

Attached
Lat-Long calc example.pbix (545.5 KB)


#6

Sam,
Thanks, some good ideas here. Your “Testing” table is using a technique I’ve not seen before… very clever. If I’m reading it correctly, you are visualising the table through the crossjoin but, filtering the table so only those with a Lat1 > Lat2 (by 2.5) are shown and using Calculate to set the context for the returned Lat/Lng.
I get the idea but, i’m not sure if filtering on the Lat or Long fields is an option. I’ll have to investigate that further. I still come back to having to calculate the distance (although this can now be a DAX measure) called as part of the expression component of the Filter statement.


#7

Cool,

Yea my idea was to try to cut down the initial table as much as you can. That will prevent the 10s of millions of rows initially I would suspect.

Then you should be able to easily then filter this down even more by placing that more complex calcs in measures.

Chrs
Sam