Columns vs. Rows - Impact on processing speed

We’re working with a lot of data; 40 million rows. My colleague who pulls the data with SQL queries broke out sales in 3 columns, “Single Unit Sales”, “2 to 9 Unit Sales” and 10+ Unit Sales", referring to the number of units purchased in the transaction. I thought it would be better to have “# of Units in Transaction” as a row and just have one column, “Sales”, but he doesn’t think that would be efficient because it would increase the number of rows from 40 million to 120 million.

Which approach would result in the fastest processing speed in Power BI?

Hello @Eric,

Thank You posting your query onto the Forum.

Well obviously the smaller the data size the more efficient it will be.

You can do calculation of single unit and double units transactions by writing the measures in the Power BI itself instead of loading the data 3 times more than the actual data size.

Hoping this helps you. :slightly_smiling_face:

Thanks & Warm Regards,
Harsh

Thanks. Our data is not by transaction, its aggregated by week. So you are suggesting the multiple columns to separate out single unit and double unit sales instead of rows?

Hi @Eric. The smallest “size” of your reporting needs should dictate the “grain” of your data. It sounds like you have access to SQL staff (or yourself) that can shape the source query, so whenever possible do your grouping/aggregating/transformations in your data source; this will minimize not only the volume of your data, but the work that Power BI has to do to present the data. With respect to multiple columns, if you need only a grid of 1, 2-9, 10+ by week, then this sounds like an avenue to explore. If follow-on reports may need other groupings, then perhaps you should request a smaller “grain” and adjust as necessary in PBI. Greg

1 Like

Hello @Eric,

Yes, in this case it seems adding 3 columns seems to be the viable options rather loading such a huge data. Although we know that data model should be bigger rather than wider in nature but under this circumstances its best fit to go for 3 columns considering it as an exception. And as @Greg has rightly mentioned, if it’s possible to do your transformations in data source then carry out the operations then and there which will enable help you in report optimization process.

Thanks and Warm Regards,
Harsh

Hi @Eric, a response on this post has been tagged as “Solution”. If you have a follow question or concern related to this topic, please remove the Solution tag first by clicking the three dots beside Reply and then untick the check box. Also, we’ve recently launched the Enterprise DNA Forum User Experience Survey, please feel free to answer it and give your insights on how we can further improve the Support forum. Thanks!