85 Million Rows

kavurir1 · July 23, 2019, 6:22pm

Dear All,

We are developing reports off of a not so well designed data model. We have 85million rows in the database. Currently users are able to visualize the data through Spot Fire visualizations because Spot Fire (Tibco) allows on-demand/dynamic querying. We are now building the same visualizations in Power BI.

If we use Data Import, it is too large of size and even if we use direct query, we will not be able to pass parameters on the fly. For example, the users are looking at 13 months of historical data. If there is a way to query the data for each selected day, reports will work fine. But Power BI doesn’t allow this on-demand data load.

If someone can point me to a content in this forum or Sam’s tutorials or suggest a work around, it is a great help.

Thank you

sam.mckay · July 24, 2019, 12:59am

I think the reality here is you want to attempt to summarize the data somehow and reduce the number of rows considerably before looking to use this in Power BI.

A few things to consider

Do you need really need all of the data
Can you reduce or summarize this using SQL queries as you query it in Power BI
Do you need extensive amounts of historical data
What summarization or segmenting of the data can you complete outside of Power BI

Also depending on the the setup of you data infrastructure directquery mode can be useful, but I’m unsure of this based on what you’ve provided thus far.

If you can find a way to summarize this data somehow outside of Power BI, that’s what I would be aiming for. There are a range of tools (like sql server for example) which are more custom built for this type of thing.

Sam

kavurir1 · July 24, 2019, 4:53am

Hi,

Thank you very much for your suggestions. Unfortunately we have no control on data model and we cannot even summarize the data.

We receive the data from a single view with 100 columns and users have been using reports with both summary and detailed information on the same report which forces us to load all the data. Also, their existing reports have many slicers which are columns of the table. Which makes things difficult because we have to create mapping tables and use Related function (learned from you). Otherwise we need to un-pivot the total table.

The following are a couple ideas I want to try. Let me know what you think.

Power BI is not event driven, but I am thinking if we can let the users select the date range on a tab and click on the report icon they want to see (Application like visualization lesson of yours) and while the report moves the bookmark, the data gets loaded in the background through parameters - Does this work? It is almost like on-demand data load.

Can we create the whole data model inside Power BI. Like query the table (load), write DAX and create summary data tables, slicer tables on the fly and use the newly created tables to display data in reports. This will avoid un-pivoting problem.

Nick_M · July 24, 2019, 2:00pm

That’s going to be your main issue. DAX works on columns, not rows. So the # of rows really shouldnt be a concern at this point. But 100 columns is just way to much. Not sure if possible, but I would try to work with your IT team to see if you can get those columns way way down. You would want to take the majority of those columns out of your Fact table and create Dimension tables and relate those to your fact table.

You said you don’t have much control, but what if you load the giant table ( only because you have too) and then create your dimension table and such in Power Query?

Nick,

kavurir1 · July 25, 2019, 7:19am

Nick_M,
Thanks for your suggestion. I have removed all the unnecessary columns in the query. I have reduced the number of columns to 60 and there is good improvement in performance but the problem is much bigger. We don’t have control on the data model. Current reporting system uses on-demand data which lets the users view the data of larger size. Power BI doesn’t make a trip to Data source once the model is loaded.

I am hoping if there is any workaround for this problem. Greatly appreciate any help in this direction.

Ramesh

Nick_M · July 25, 2019, 11:43am

Not sure if possible ( it may be since you were able to delete columns) but what if you import your giant table into Power Query, then you build a new data model on top of that? That’s how I do a lot of reports in my current job, get a giant data file with everything, and from there create my dimension tables and so forth.

If you are stuck with one giant table I think you maybe out of luck in terms of performance. DAX was built to work on columns (not rows like database) so the more columns in a table the worse the performance. Plus, you got to consider the unique values in each column since VertiPaq works much better with less unique rows than many.

Another thing to consider in terms of performance, is that when you write DAX chances are you will use FILTER (or something similar) you always want to filter your smaller dimension tables and then send that to your fact tables. When you filter your fact tables you are in for a tremendous performance hit.

Just some food for thought.

Nick,