I am looking for slight different variation of Pareto principle.
I went through this but this doesn’t solve my problem.
I am looking for in a given list, all the customers which comprises of 80% of the overall sale. Please keep in mind I am not looking for straight up 20% customers.
Something like this I am envisioning,
Give me the max rank, when sales is 0.80 of overall sales and then give me those customers.
Hi @rit372002, I hope that you are having a great experience using the Support Forum so far. We’ve recently launched the Enterprise DNA Forum User Experience Survey, please feel free to answer it and give your insights on how we can further improve the Support forum. Thanks!
Yes, it is slow to the fashion that it’s unusable. I am not sure why it would work for the way I was doing it (topN approach), which is un-acceptable to the client.
And Why it complains about resources for your formula even if I choose 1 Parent Retailer which has say 10 growers. Shouldn’t it should look for just 10 growers and not scan the whole table?
Still searching for answers. Really appreciate your solutions.
Your solution looks good to me, and I don’t see any obvious way to speed it up. However, I’m looping in Enterprise DNA expert @Nick_M, who is exceptional on issues of DAX optimization.
in your Time table, change the data type to date and not date time
There are a lot of columns in your Fact table, try to remove some of them or create dimensions out of them. 56 columns in any table is too many, but even more so in your fact tables.
It all comes down to how DAX best works. At a high level DAX best works with skinny/long tables vs. short wide tables set up as a star (or snowflake) schema. Seems like you are getting your data from SQL which is set up in rows (and not columns like DAX likes) so that’s where I would I start.
If you dont need a column, do not import it
Fact tables should be only used for aggregating data
Dimension tables should be used to filter the fact table
1:M relationships are much faster than M:M or 1:M bidirectional
Only after setting up the data model the best for DAX would I start to look to optimize the actual DAX expression.
Pareto Sales 2 =
VAR ActualSales = [Enterprise total Sales]
VAR SelectedSales = CALCULATE(
[Enterprise total Sales],
ALLSELECTED( GROWER[CRM_PARENT_GROWER_MASTER_KEY] )
)
VAR VT = FILTER(
SUMMARIZE(
ALLSELECTED( RETAILER[Parent Retailer AB Number] ),
RETAILER[Parent Retailer AB Number],
"@Sales", [Enterprise total Sales]
),
[@Sales] >= ActualSales
)
VAR Calc1 = SUMX( VT,[@Sales] )
VAR Result = DIVIDE( Calc1, SelectedSales, 0 ) Return Result
Now, I think if I get running total of this and then divide it by overall sale (in this case, 3.1 million) then I can get 80% of total sale. Very close.
One mistake in the below VAR ( after ALLSELECTED the reference column was wrong):
VAR SelectedSales =
CALCULATE(
[Enterprise total Sales],
ALLSELECTED( RETAILER[Parent Retailer AB Number] )
Pareto Sales 2 =
VAR ActualSales = [Enterprise total Sales]
VAR SelectedSales = CALCULATE(
[Enterprise total Sales],
ALLSELECTED( RETAILER[Parent Retailer AB Number] )
)
VAR VT = FILTER(
SUMMARIZE(
ALLSELECTED( RETAILER[Parent Retailer AB Number] ),
RETAILER[Parent Retailer AB Number],
"@Sales", [Enterprise total Sales]
),
[@Sales] >= ActualSales
)
VAR Calc1 = SUMX( VT,[@Sales] )
VAR Result = DIVIDE( Calc1, SelectedSales, 0 )
Return
Result
If you change logic in measure, the change should also be applied to the table.
This reference column: [CRM_PARENT_GROWER_MASTER_KEY]
has been replaced with this one: RETAILER[Parent Retailer AB Number]
After smal adjustment, the measure is working, but again, it is tooooo slow (unusable).
Maybe the approach to solve the problem should be different.
Thanks @mno but I need this table to have individual growers in that table, not retailers. When you changed the formula to Retailer, it gives me data at the Retailer level. I need it at the grower level.
So I need it to sum those grower numbers (percentages).
Lol. I am in catch 22. The first formula doesn’t return anything for me. It just spins. Is there a no way to add the running sum with the formula variation that I did?
Selected Sales: Grower
VT : Retailer
Pareto Sales 3 =
VAR ActualSales = [Enterprise total Sales]
VAR SelectedSales = CALCULATE(
[Enterprise total Sales],
ALLSELECTED( GROWER[CRM_PARENT_GROWER_MASTER_KEY])
)
VAR VT = FILTER(
SUMMARIZE(
ALLSELECTED( RETAILER[Parent Retailer AB Number] ),
RETAILER[Parent Retailer AB Number],
"@Sales", [Enterprise total Sales]
),
[@Sales] >= ActualSales
)
VAR Calc1 = SUMX( VT,[@Sales] )
VAR Result = DIVIDE( Calc1, SelectedSales, 0 ) Return Result
Thanks @mno. I really appreciate you spending so much time on this. Its so sincere.
Unfortunately, I proposed similar solution to begin with to the client. Please see this pic. It should be the first tab in the original file that I sent in the beginning. I was also using TopN to reach to the 80%.
They said no. They needed to see static list without their sales people doing sliders and reaching to 80%.
I was thinking about your case, trying power query and DAX and nothing is working as expected.
However, I’m only PBI amator, so I believe PBI experts will help you out.
I’m curious how your challenge could be solved.
If you come up with the solution and if you don’t mind, could please share?
@mno So my boss came to rescue. He created pretty powerful and elegant SQL with some subqueries and partition by clauses. I am using that as standalone table. It works perfect for this situation and it’s pretty fast. Here it is to just give an idea. Thanks for all of your help.
SELECT
a.crop_year_key,
a.parent_retailer_key,
-- a.parent_retailer_ab_name,
a.crm_parent_grower_master_key,
a.grower_reported_name,
a.grower_physical_city,
a.grower_physical_state,
a.retailer_grower_sales_amount,
a.total_sales_amount,
a.total_sales_amount_pct,
a.running_pct
FROM
(
SELECT
wfas.crop_year_key,
wfas.parent_retailer_key,
wfas.crm_parent_grower_master_key,
wfas.grower_reported_name,
wfas.grower_physical_city,
wfas.grower_physical_state,
wfas.retailer_grower_sales_amount,
wfas.total_sales_amount,
((wfas.retailer_grower_sales_amount / wfas.total_sales_amount) * 100) total_sales_amount_pct,
sum(((wfas.retailer_grower_sales_amount / wfas.total_sales_amount) * 100)) over (partition by parent_retailer_key
order by ((wfas.retailer_grower_sales_amount / wfas.total_sales_amount) * 100) desc) as running_pct
FROM (
SELECT crop_year_key,
parent_retailer_key,
wdgm.crm_parent_grower_master_key,
wdgm.grower_reported_name,
wdgm.grower_physical_city,
wdgm.grower_physical_state,
SUM(pos_sales_amount) OVER (PARTITION BY crop_year_key, parent_retailer_key, wdgm.crm_parent_grower_master_key) retailer_grower_sales_amount,
SUM(pos_sales_amount) OVER (PARTITION BY crop_year_key, parent_retailer_key) total_sales_amount,
ROW_NUMBER() OVER (PARTITION BY crop_year_key, parent_retailer_key, wdgm.crm_parent_grower_master_key ORDER BY invoice_number) keep_flag
FROM wf_marts.wfs_f_agrimine_pos pos
INNER JOIN wf_marts.wfs_d_grower_master_v wdgm ON pos.grower_master_key = wdgm.grower_master_key
INNER JOIN wf_marts.wfs_d_product wdp on pos.product_key = wdp.item_number_id
WHERE crop_year_key in( 2019,2020) and wdp.business_segment_name = 'CROP PROTECTION' and parent_retailer_key <> 0 -- and parent_retailer_key IN ( 7023923, 7025806)
) wfas
WHERE
wfas.keep_flag = 1
ORDER BY
wfas.parent_retailer_key, ROUND((wfas.retailer_grower_sales_amount / wfas.total_sales_amount) * 100) DESC
) a
WHERE
a.running_pct < =100
;