Many to many and performance

Roberto · May 25, 2020, 4:24pm

I’m working on a small model to find matching items in two different tables related from a many to many.
One table has failures, that is outages for specific components. The other table has outage predictions computed by a ML module. I’m trying to verify how the ML is performing checking how many outages have been predicted and how many predictions are correct.
The production model is in Power BI but for brevity I made a small model in Excel with 2 DAX formulas that are essentially the same in PBI, as well the model. The main concern is speed. I didn’t find any better solution than this one, but tables are over 1M rows and PBI formulas even if are working fine takes a very long time. Moreover when I use them, they provide rows even when there is no prediction or outage.
How is that?

Thanks for help

Roberto

outages.xlsx (270.6 KB)

Melissa · May 25, 2020, 8:35pm

Hi @Roberto,

Some general observations to start with.

There is no date/time table in your model which means that Power BI will generate auto date/time tables hidden from view. You can check that with a tool like Dax Studio for example, because this can negatively impact model size.
The granularity of a combined date/time value is very high.

Don’t know if this will improve performance, but that’s no reason not to try, right?
Started with the Model by adding a Date- & Time tables and separating all date/time values.

Then created two measures:

Downtime = 
CALCULATE( 
    COUNTROWS( Failures ),
    FILTER( VALUES( Failures[Startdate] ), Failures[Startdate] <=  MAX( Dates[Date] )),
    FILTER( VALUES( Failures[Enddate] ), Failures[Enddate] >= MIN( Dates[Date] )),
    FILTER( VALUES( Failures[Starttime] ), Failures[Starttime] <=  MAX( 'Time'[Time] )),
    FILTER( VALUES( Failures[Endtime] ), Failures[Endtime] >= MIN( 'Time'[Time] ))
)

and

Outage predicted = 
NOT (
    ISEMPTY (
        CALCULATETABLE( 
            FILTER( VALUES( Predictions[Date] ), Predictions[Date] <=  MAX( Dates[Date] )),
            FILTER( VALUES( Predictions[Date] ), Predictions[Date] >= MIN( Dates[Date] )),
            FILTER( VALUES( Predictions[Time] ), Predictions[Time] <=  MAX( 'Time'[Time] )),
            FILTER( VALUES( Predictions[Time] ), Predictions[Time] >= MIN( 'Time'[Time] )),
            FILTER( VALUES( Predictions[outage]), Predictions[outage] =1 )
        )
    )
)

.
with this result.

I hope this is helpful, here’s my sample file.
eDNA - Many to many and performance.pbix (146.1 KB)

Roberto · May 26, 2020, 9:44am

Hi @Melissa,
thanks for prompt answer. Your model does exactly what I was trying to do. I made the same formulas on my PBI and speed is no longer an issue.

I had added date and time tables but in my case they apparently didn’t help.

I didn’t understand the way you use inactive relations to leverage time dimensions

Thanks again

Roberto

Melissa · May 26, 2020, 11:31am

Hi @Roberto,

Glad to hear that worked well for you.
Please don’t forget to mark the post that answered your question as Solution via the … (3 dots)

The inactive relationships are required because there can be only one active relationship between two tables BUT both Start- and Enddate are- and should be treated equally in these DAX calculations.

@sam.mckay showcases how to deal with that in what’s know as the orders in progress pattern

.
Furthermore @BrianJ brought some video’s together in this post on TREATAS and USERELATIONSHIP

.
I’m confident that after reviewing these you’ll have a better understanding on how this works.