Filtering Dimensions based on Fact table

rlk · July 1, 2019, 4:29am

Hello, I am new to the forum.

My problem is somewhat general. I have multiple large dimension tables (one with 14 million rows) linked to a fact table. In order to reduce load time and table sizes, I would like a query editor method to filter the large dimension tables to contain only records where the key field is contained in the fact table.

If I set the cross-filter direction to both, I get the desired result. However, this seems to always be discouraged. The reasons for not using ‘both’ for the relationship seem to be (1) if there are multiple fact tables there could be issues in resolving the filter back to the dimension, (2) Poor performance

I also tried merging the dimension table with the fact table using an inner join and then removing the fact table fields, but this option takes a very long time to process.

This would seem like a common problem when record counts are high, however, I have been unable to find a best practice solution in the courses or elsewhere.

I would appreciate information on any best practice solutions for handling this problem.

Thanks,

Nick_M · July 1, 2019, 3:28pm

there could be a few things in play here, but the one that stands out is your dimension table of 14m rows?! That’s not really a dimension table any more. Is there any chance to break those out a little further? And what is in the tables ( meaning, what types of data is being stored in your fact tables?) are you using primary and foreign keys?

Have you tried pushing some of this back to the database? Assuming this data is coming from a relational database?

Basically, there’s a lot here that could help but without seeing a sample of the data is hard to tell exactly.

Nick,

sam.mckay · July 2, 2019, 1:00am

Totally agree with Nick on this one. 14M rows for a dimension table does not sound right to me…

That definitely sounds some type of fact table associated to your other table somehow.

When working with table this big, your best bet is to try to clean some of this up at the database layer. Power BI can be very fast but that size really is pushing this a little bit.

My guess here there is likely substantial updates that could be made at the database level here, but hard to say exactly as haven’t seen the data on this one.

Thanks
Sam

rlk · July 2, 2019, 1:08am

Thanks for the reply Nick.
This is an inventory application. The 14m record lookup table is inventory locations down to the bin level. The fact table has inventory transactions, also at the bin level. The fact table is parameterized by date so the user can limit the fact table load size based on transaction date. A 5m row fact table would be a normal load, but no need for locations that are not used by a transaction. (I can only say that my users need this level of granularity for their analysis.)

Having said that, I would think there should be a generic solution. Filter my lookup table to contain only those keys in my fact table. Seems like a simple problem. My simple solution (not so elegant) … set the cross filter to ‘both’.

Nick_M · July 2, 2019, 5:08pm

@rlk,

The dimension tables are simply just too big. I’m thinking there has to be a way to break that out to smaller parts? Hard to tell without seeing it.

You really don’t want to use your fact tables as filters, which is one of the reasons you are seeing such a performance hit. And you really don’t want to use a 14m row table as a filter either.

Any chance you could upload a small sample of what you are working with?

rlk · July 3, 2019, 4:08am

Thanks again Nick, I appreciate you taking a look. I don’t want to trouble anyone with the specifics of the tables, but would like to try one more time to clarify what I believe is a general problem for which I was seeking solutions.

Problem Description:
Dimension and fact table views from a SQL database are made available to users for loading into Power BI desktop. Users are provided with parameterized templates with a pre-built data model. The parameter allows the user to enter a beginning transaction date in order to limit the number of pre-report fact rows that get loaded. (This all works great!).
Unfortunately, there is no pre-report parameter that can limit the row count for the dimension tables. Unchecked they would load unnecessarily large row counts. I understand that loading millions of records into Power BI is not a good idea.

So, the problem that I so poorly presented was to find a way to reduce the dimension row counts in Power Query to contain only those keys loaded with the filtered fact table. This would seem like a common problem when the source data is in a star schema format similar to what I have described.

I am a SQL developer, so my fall back solution is outside of Power BI. It would be to use a Table Valued Function (TVF) to replace each dimension view. I can pass the same beginning date parameter to the TVF which could do a join with the fact table and return dimension counts filtered as per the fact table.

Thanks again.

sam.mckay · July 3, 2019, 9:34pm

I’m sure there is a way to potentially do something in the query editor here, but it would depend on what the data looks like ultimately.

Are you able to load a sample of your data (or just data structure) in a pbix file, that would make it easier to present some more meaningful ideas I think.

Thanks
Sam

EnterpriseDNA · November 11, 2019, 2:16am

Thanks to the contributors of this post. Due to the length by which this post has been active, we are tagging it as Solved. For further questions related to this post, please make a new thread.