How to Scale Direct Query Reports

krista.cole · July 25, 2023, 8:00pm

Hello,

We have developed some Direct Query Power BI reports which connect to our AWS SQL Server database and refresh every 5 minutes. Leadership loved the POC and wanted to scale to multiple locations, but now we are finding it creates to heavy a load on the server so have paused the project.

The query is not terribly complex. It has a few subqueries, many joins on 10 tables, some of which are > 1 million rows and outputs only ~5k rows.

What are some potential solutions and best practices given this situation? Here are some things I am considering
-Use staging tables with SQL Server Agent job
-Send data to our datalake in Snowflake instead of querying the production database (not sure how to productionalize this, I made a Python demo which sends the data to Snowflake, so in theory this could work). Maybe use third party service for this?
-Enable CDC in database (again not sure which product(s) we would use after this is enabled)
-Use Azure Synapse Analytics since it is purpose built to handle Direct Query at scale (probably use the same Python script, but this does not seem like a robust solution, surely there is an out of the box product in Azure)

Any help, ideas, suggestions are greatly appreciated.

Thanks for your time,
Joe

jsodhi · August 5, 2023, 7:06pm

Hey @krista.cole

If you are refreshing the data evey 5 minutes, i believe you are on Import mode and not on direct Query mode.

As you mentioned that from a table of 1 million rows, the final output is of just 5k rows, The best way around is to create a summary table. if using direct query, you can create this within your SQL environment itself. You dont need to connect to any other platform.

even if you are on import mode, you should consider creating a summary table in modelwhere ever you can. this is always a good technique to avoid any performance issues just like the one you have mentioned.

there are a few nuances when creating such summary tables in the model. you can look out for some blogs / youtube videos regarding this

I hope this helps.

Regards
Japjeet

krista.cole · August 7, 2023, 2:41pm

Thank you Japjeet! This is what I proposed to IT, I think a SQL Server Agent job which creates this summary table will be the best immediate solution to help scaling this report.

And FYI it is already in Direct Query, that is how it achieves the 5 minute refresh.