Can I improve querying performance of my larger table in Power BI

JNguyen · July 11, 2019, 10:04am

Good day,

I have a Fact table from SQL Server named Sales and it serves as my base query. And then I have 3 queries that reference Sales and group by combinations of different IDs. These queries reference Sales group the results and sum the revenue, margin and quantity fields. I also set Sales attributes ‘Load to Report’ and ‘Enable Refresh of this Query’ to FALSE.

Power BI spins for a good 15 minutes and retrieve around 7M records from SQL Server when I run refresh for each of the queries. According to the tip that I got, I should only load the source data once when I reference a query - this is regardless of the number of queries that reference the initial query. But this is not the case with what I’m working on.

Anyone here also experienced this? Or do you have any other tips to improve the performance?

Joyce

sam.mckay · July 12, 2019, 12:13am

Hi there,

The big thing for me here is that you seem to be doing calculations and summaries in your queries. Personally I would not do this on a large data set because as you’re experiencing it will be very slow.

You should look to simplify what you need from your core Sales table and then do any the calculations using DAX measures. This is the most optimized way to do things in Power BI.

I’m sure you Sales table doesn’t need to be that wide (or have that many columns). It’s just about getting the table as optimized as possible and possibly doing a bit of work with SQL as you bring it into your Power BI model.

Some background,

When a query is loaded to the report, it will always be evaluated in isolation. It may also be evaluated by different processes, so it does not form a dependent tree structure.

PowerQuery evaluations keep a cache of data seen by evaluations on disk. So if you are within the same cache session and pulled on multiple times, you will only pay for it the first time.

This cache only applies to raw data coming from the data source, any additional transformations will need to be performed on top of it.

Each loading session is a new cache session since you always want the latest data when you load queries to report.

So you will only need to pay for the data coming from data source once per loading all of the queries. But you will need to pay for the transformations on top of that.

With the situation you have there, DirectQuery can help with the performance. You always operate on the remote data source with DirectQuery and never keep a local copy of the data.

So there isn’t a “loading” phase and you will pay for the data as the visuals need them.

Hopefully this gives you some background.

Check out many ideas in this course module for optimizing your queries and tables.

Thanks
Sam

EnterpriseDNA · November 11, 2019, 2:24am

Thanks to the contributors of this post. Due to the length by which this post has been active, we are tagging it as Solved. For further questions related to this post, please make a new thread.