Dealing with large daily datasets

kenyatta67 · October 31, 2023, 2:21am

Hi guys,

Would like to get your opinion on a case that I dealt with recently. I have 1 calendar table and 1 fact table. The fact table is 35M rows long. When doing Dateadd( with -1) the data moved up while when I did (+1) it moved down (the movement should be opposite). Trying it on a shortened dataset with no changes in terms of the links or any other formulas it moved at the right direction. Then today the longer dataset just displayed the correct direction as well. Has anyone ever encountered something similar? Also, any advice of when there is a need to use daily data for 35M rows what tips or tricks there are to make it easier for the model without deleting any data?

Additional context:

The calendar table has been marked as date table and is linked to the fact table via a 1 to many relationship.
Granularity of the date table is daily with Date today, name of member, whether membership is active as of date today, membership type and 5 other subtypes.
Formula that didn’t initially work is =CALCULATE(DISTINCTCOUNT(‘Fact’[Serial Number]),DATEADD(‘Calendar’[Date],-1,YEAR)).
I realized that for this case specifically, yes data can be folded in the query, but in cases where it can’t be what should be done? Does adding more dimension tables reduce the burden to measures?

SamMcKay · November 1, 2023, 8:27pm

Optimizing Performance:

Considering your fact table contains 35 million rows, you may encounter performance issues when attempting to analyze daily data across such a large dataset.
It’s recommended to aggregate or summarize the data to a higher level of granularity, like monthly or quarterly, especially if you don’t need to analyze at the daily level.
Aggregating the data will reduce the number of rows processed by measures, resulting in improved performance.
You can create a new table or use Power Query to summarize the daily data into a higher granularity table and establish relationships with the other tables.

kenyatta67 · November 1, 2023, 9:27pm

Hi Sam,

Thank you for responding. Ultimately, yes that is what I did where I used a higher granularity. However, there may be instances when this is not possible. Like even when the data is already annual and there is still 35M rows because you operate a very large business with millions of unique transactions annually. In this type of scenario what do you advise? Does adding additional dimension tables or splitting the fact tables reduce load? Like for example if the columns are ID num, Year, Name, Address, Age do you do 3 tables of ID-Year-Name, ID-Year-Address, ID-Year-Age? Or is this a case in which power bi may not be suitable? In which case would you know the max num of rows or cells if using PBI?