Active Product Path large datasets

CorvanDalfzen · December 9, 2019, 10:09am

Hi,

I need some help with this challenge

What do I want?
I need a report with the number of customers with active products for each day.

For example:

Date	A	A,B	B	C	Total
09-12-2019	1	2	1	2	6
08-12-2019	1	1	1	2	5
Etc.

So on the 9th of December there is 1 customer who has only active product A, there are 2 customers who have active product A and also active product B (ActiveProductPath = A, B), there is 1 customer with only active product B and there are 2 customers who have only active product C.

Dummy file and a solution
You can find a dummy file with a solution! here: ActiveProductPathFinalv2.pbix (118.8 KB)

You have already a solution, so why this question?
The technique used as a solution isn’t working for large datasets because of memory.

Can you explain this?
The dummy file contains 14 products (= rows), with the technique used (day-customer-productpath) this becomes 922 rows (= 65,86 times 14).

I have another file with 7.769 different products and with the technique used in the dummy file the table “day-customer-productpath” became 4.965.732 rows (= x 639,25)! This wasn’t a problem for Power BI.

But now I have a file with 1.7 million different products and with this large dataset my computer gets a memory-problem…….

So what’s your question?
Is it possible to solve this challenge without the technique used in the dummy file? With other words, is there a DAX-solution to create virtual tables and to count customers with different “ActiveProductsPaths”?

Hope to hear from you soon.

Thanks in advance!

Regards from the Netherlands,

Cor

sam.mckay · December 11, 2019, 1:56am

Hi Cor,

Couple of thoughts on this that may help.

First DISTINCTCOUNT is a very slow function. And works very slow with larger datasets

You are better to use COUNTROWS( VALUES( … ) )

You can also maybe adjust the VALUES virtual table (use something else) using any number of different table functions like FILTER, SUMMARIZE etc. This way you could potentially reduce the size of the virtual table that you are looking to count up. This would be a solid place to start.

Check out this video here to learn a bit more about that

Also based on the details you’ve provided it’s not really clear exactly which part is slow. There’s only one measure being used in the example file (which is the one above)

Also 1.7M rows is a lot. You have to find a way to reduce the number of iterations a calculation is having to do by potentially using virtual tables within a formula. This is the key. You want DAX to be completing calculation over a summarized version or reduced version of that physical table because having to work through 1.7 individual rows it too much.

Thanks
Sam

CorvanDalfzen · December 11, 2019, 3:03pm

Thanks Sam, great video! I shall try this technique the next days.