Virtual tables vs DAX formula. Model size increased by 50%

juanjo · October 26, 2020, 9:06am

Hi all,

I have been working the latest days with a new logic that finally has worked, but also increased the model size by 50% ¡!

I know there is a better way to do it, but it requires a complex dax formula that I do not kow how to handle. As I have learnt, I have divided everything in sequential steps.

Context: We have defined different customer segments, so depending what products the customer chooses, the order is assignated to one segment or another. We have segments such us ‘Prefer rice’ or ‘Prefer chicken’, so we can understand and develop each customer segment separately.

The fact table ‘Sales table’ has one row for each order line.

Order ID	Qty	Product	Total Order Line	Total Order
x	1	Bread	1€	49,26€
x	1,3	Rice	13,26€	49,26€
x	1	Food product	11,00€	49,26€
x	2	Food product	24,00€	49,26€

First Step. I have created a virtual table called ‘Orders summarized and ranked’ that summarize each order by the segment related. I’ve also created a new column that ranks the segment by its sales importance.

Order ID	Segment	Weight	Rank
x	Bread	2,03%	3
x	Rice	26,91%	2
x	Food	71,05%	1

Second Step. I have created a virtual table ‘Orders with Top 1 segment‘ that captures from the previous table the Order ID and the segment in the number 1 position

Order ID	Segment
x	Food

Third Step. Finally I have created a new column in Sales Table with the Segment ID.
Sales DNA Forum.pbix (1.8 MB) Segment id = related(‘Orders with Top 1 segment’[Segmento])

I know I should have used TOPN instead of Rankx but it didn’t work. Besides, the main problem is the model size, I guess the virtual tables are not so ‘virtual’

I know it can be done through a dax formula so the logic is only applied for the filter context (dates and stores) and not for the whole sales table.

Thanks in advance for your help.

Juanjo

BrianJ · October 26, 2020, 1:49pm

@juanjo,

The issue here is that you’re not actually using virtual tables – you are using physical tables created via DAX expressions. Virtual tables exist only within the context of measures. It might sound like semantic hairsplitting, but the implications for size and potentially performance are significant.

If you want to reduce the file size, you will need to create truly virtual tables using variables within a measure. Here’s a recent example of extensive virtual table manipulation I did that used a similar stepwise approach to the solution that you used above for your DAX expression tables. (Note: the example is somewhat of a nonsense approach, since DAX was not the proper tool in this case but is a good illustration I think of the power of manipulating virtual tables through variables).

I hope this is helpful.

Brian

juanjo · October 26, 2020, 4:25pm

Hi @BrianJ thanks a lot for your quick response,

I understan what you mean. Good to see where the problem is.

I have tried to move the physical table to a Dax Expression, I am getting close, but the measure is not working

This is the expression so far. Thanks Sales DNA Forum.pbix (1.8 MB)

1st Segment =
VAR OrdersSummarized = // New virtual table group by orders & segments, calculating the weight.
SUMMARIZECOLUMNS (
‘Sales Table’[id_Unica], // Order id
‘Products’[Segmento],
“Weight”, [% Sales line over order])

VAR OrdersSummarizedRanked = // I creeate a new table with a new column that ranks each segment, so I can choose the first later on.
ADDCOLUMNS (
OrdersSummarized,
“Rank”, rankx(
FILTER(
OrdersSummarized,
[id_Unica]=EARLIER([id_Unica])),
[Weight],Dense))

VAR Top1Segment =
CALCULATE(
[Max Segment], // a measure that gives me the segment id for each order in context
FILTER(OrdersSummarizedRanked, [Rank] = 1)) // I am expecting to get just the 1st ranked segment for each order

RETURN Top1Segment

juanjo · October 26, 2020, 8:20pm

Well, it finally worked … There was a problem with Summarizecolumns… Still too much to learn. There it goes the code:

1st segment w/ Top N =
VAR OrdersSummarized = // New virtual table group by orders & segments, calculating the weight.
SUMMARIZE(
‘Sales Table’,
‘Sales Table’[id_Unica], // Order id
‘Products’[Segmento],
“Weight”, [% Sales line over order])

VAR OrdersSummarizedRanked = // I creeate a new table with a new column that ranks each segment, so I can choose the first later on.
ADDCOLUMNS (
        OrdersSummarized ,
        "Rank", rankx(
                 FILTER(
                    OrdersSummarized ,
                    [id_Unica]=EARLIER([id_Unica])),
                    [Weight],,,Dense))
VAR Result = 
    CALCULATE([Min Segmento],FILTER(OrdersSummarizedRanked,[Rank]=1))

RETURN Result
)

Thanks a lot @BrianJ

BrianJ · October 26, 2020, 9:58pm

@juanjo,

Nice work – you’re 98% of the way there. Just two changes needed, one structural and one recommended notational.

Structural: - don’t use SUMMARIZE to add measure based virtual columns to a virtual table. This can result in irregular results. Instead, use SUMMARIZE nested inside of ADDCOLUMNS (or alternatively SUMMARIZECOLUMNS). Here’s how your first variable will change:

VAR OrdersSummarized =
```
 ADDCOLUMNS(
 	SUMMARIZE(
 		‘Sales Table’,
 		‘Sales Table’[id_Unica], // Order id
 		‘Products’[Segmento]
 	),
 	“@Weight”, [% Sales line over order]
 )
```

if you want to understand the full rationale behind wrapping SUMMARIZE in ADDCOLUMNS, I’d refer you to this article:

Notational – you will note above when I added the measure column to the virtual table, I put an “@” in front of weight. This is a best practice to distinguish calling a virtual column from a measure, suggested by Ferrari in this article, which as the expert team we’ve generally adopted in our solutions on the forum.

I hope this is helpful.

Brian

AntrikshSharma · October 27, 2020, 4:20am

That’s because SUMMARIZECOLUMN can’t work in a filter context modified by context transition. PBI generates queries using SUMMARIZECOLUMN and outer SUMMARIZECOLUMN will modify the filter context for the inner SUMMARIZECOLUMN , here is an example.

Measure 2 = 
SUMX (
    SUMMARIZECOLUMNS ( Products[Color], Dates[Calendar Year Number] ),
    [Total Sales]
)

This measure will work fine in a card visual, DAX studio but not for Matrix or any other visual.

Here is a query generated by the card

// DAX Query
EVALUATE
  ROW(
  "Measure_2", 'Table'[Measure 2]
)

Here is a query generated by a matrix:

// DAX Query
EVALUATE
SUMMARIZECOLUMNS (
    -- This is what causes the problem, it is a known issue and I don't think microsoft has any plans on resolving this
    ROLLUPADDISSUBTOTAL (
        'Products'[Brand],
        "IsGrandTotalRowTotal"
    ),-- This populates the grand totals
    "Total_Sales", 'SalesMeasures'[Total Sales]
)

.

If I include the measure which uses SUMMARIZECOLUMNS in the matrix, it will break

and won’t generate any query.

BrianJ · October 27, 2020, 12:50pm

@AntrikshSharma,

Love these DAX deep dive explanations that you do.

Thanks very much for taking the time to put them together.

Brian

juanjo · October 27, 2020, 3:26pm

Thanks both @BrianJ, @AntrikshSharma I really aprreciate it, I follow your lessons and best practices !! Keep fighting with the model, I will let you know if i get stuck.

Thanks

Juanjo