Slight variation of 80/20 rule

I am looking for slight different variation of Pareto principle.
I went through this but this doesn’t solve my problem.

I am looking for in a given list, all the customers which comprises of 80% of the overall sale. Please keep in mind I am not looking for straight up 20% customers.
Something like this I am envisioning,
Give me the max rank, when sales is 0.80 of overall sales and then give me those customers.

Thanks,
Ritesh

Hi @rit372002.

To help us further visualize your issue, could you please provide as many as you can of:

  • Your work-in-progress PBIX file, using sanitized data if necessary
  • Your dataset as an Excel file (again, sanitized if necessary)
  • A mock-up (marked-up screenshot or Excel file) of your desired outcome.

Greg

Have you tried the disconnected table trick? Maybe in conjunction with the dynamic grouping.

Hi Ritesh,

If I understood correctly the following measure should do the trick.

Pareto Sales =
VAR ActualSales = [Total Sales] 
VAR SelectedSales = CALCULATE(
   [Total Sales],
    ALLSELECTED( Data[Customer] )
) 
VAR VT = FILTER(
    SUMMARIZE(
        ALLSELECTED( Data[Customer] ),
        Data[Customer],
        "@Sales", [Total Sales]
    ),
   [@Sales] >= ActualSales
) 
VAR Calc1 = SUMX( VT,[@Sales] ) 
VAR Result = DIVIDE( Calc1, SelectedSales, 0 ) 

Return 
Result
Total Sales =
SUM( Data[Sales] )

The column chart has been coloured using conditional formatting.

Pareto_v1.pbix (25.5 KB)

Take care,
Mariusz

1 Like

Hi Mariusz,

I tried your formula. It spins and nothing happens. Shown in the pic. I have annonymized data. Attached is the link for pbix.

https://drive.google.com/file/d/1vgxQj7VrtmTbwEuncubil4fd9tpaihLo/view?usp=sharing

The way I was working was: using topN formula. By clicking on different topn, I was trying to reach t 80% of sales. But that solution is unacceptable.

I like your solution but as you can see it is not working. Is it something wrong with my model?

Thanks,
Ritesh

HI Ritesh,

Here you go, the measure should look like that:

Pareto Sales =
VAR ActualSales = [Enterprise total Sales] 
VAR SelectedSales = CALCULATE(
   [Enterprise total Sales],
    ALLSELECTED( GROWER[CRM_PARENT_GROWER_MASTER_KEY] )
) 
VAR VT = FILTER(
    SUMMARIZE(
        ALLSELECTED( GROWER[CRM_PARENT_GROWER_MASTER_KEY] ),
        GROWER[CRM_PARENT_GROWER_MASTER_KEY],
        "@Sales", [Enterprise total Sales]
    ),
   [@Sales] >= ActualSales
) 
VAR Calc1 = SUMX( VT,[@Sales] ) 
VAR Result = DIVIDE( Calc1, SelectedSales, 0 ) Return Result

However, in your case, the measure is working much too slow.
I think, better solution should be find, or maybe you can apply some filters (time, etc.)

If you are not happy with the solution, please keep the post open.
There are some brilliant people who will help you.

Take care,
Mariusz

1 Like

Hi @rit372002, I hope that you are having a great experience using the Support Forum so far. We’ve recently launched the Enterprise DNA Forum User Experience Survey, please feel free to answer it and give your insights on how we can further improve the Support forum. Thanks!

Yes, it is slow to the fashion that it’s unusable. I am not sure why it would work for the way I was doing it (topN approach), which is un-acceptable to the client.

And Why it complains about resources for your formula even if I choose 1 Parent Retailer which has say 10 growers. Shouldn’t it should look for just 10 growers and not scan the whole table?

Still searching for answers. Really appreciate your solutions.

Thanks,
Ritesh

@mno,

Your solution looks good to me, and I don’t see any obvious way to speed it up. However, I’m looping in Enterprise DNA expert @Nick_M, who is exceptional on issues of DAX optimization.

  • Brian

Some quick things I noticed that may help:

  • Remove the bidirectional filters in the data model, make those 1:M
  • Remove the automatic date/time
  • in your Time table, change the data type to date and not date time
  • There are a lot of columns in your Fact table, try to remove some of them or create dimensions out of them. 56 columns in any table is too many, but even more so in your fact tables.

It all comes down to how DAX best works. At a high level DAX best works with skinny/long tables vs. short wide tables set up as a star (or snowflake) schema. Seems like you are getting your data from SQL which is set up in rows (and not columns like DAX likes) so that’s where I would I start.

  1. If you dont need a column, do not import it
  2. Fact tables should be only used for aggregating data
  3. Dimension tables should be used to filter the fact table
  4. 1:M relationships are much faster than M:M or 1:M bidirectional

Only after setting up the data model the best for DAX would I start to look to optimize the actual DAX expression.

4 Likes

@mno @Nick_M

Hi thanks for the suggestions. I slimmed it down and updated formula, luckily it is coming back with data.

I am very close, I think if I can get running total working then it could work Here is what I did:

I changed @mno formula a little bit:

Pareto Sales 2 = 
VAR ActualSales = [Enterprise total Sales] 
VAR SelectedSales = CALCULATE(
   [Enterprise total Sales],
    ALLSELECTED( GROWER[CRM_PARENT_GROWER_MASTER_KEY] )
) 
VAR VT = FILTER(
    SUMMARIZE(
        ALLSELECTED( RETAILER[Parent Retailer AB Number] ),
        RETAILER[Parent Retailer AB Number],
        "@Sales", [Enterprise total Sales]
    ),
   [@Sales] >= ActualSales
) 
VAR Calc1 = SUMX( VT,[@Sales] ) 
VAR Result = DIVIDE( Calc1, SelectedSales, 0 ) Return Result    

Now, I think if I get running total of this and then divide it by overall sale (in this case, 3.1 million) then I can get 80% of total sale. Very close.

Here is the updated file:

https://drive.google.com/file/d/124NY4554jFydtDIqcgIoqLwaPjuFhHFE/view?usp=sharing

Again, really appreciate all the help! Please let me know, how to get to that last step.

Ritesh

Hi Ritesh,

You are close, but no cigar :wink:

  1. One mistake in the below VAR ( after ALLSELECTED the reference column was wrong):
VAR SelectedSales =
CALCULATE(
   [Enterprise total Sales],
    ALLSELECTED( RETAILER[Parent Retailer AB Number] )
Pareto Sales 2 =
VAR ActualSales = [Enterprise total Sales] 
VAR SelectedSales = CALCULATE(
   [Enterprise total Sales],
    ALLSELECTED( RETAILER[Parent Retailer AB Number] )
) 
VAR VT = FILTER(
    SUMMARIZE(
        ALLSELECTED( RETAILER[Parent Retailer AB Number] ),
        RETAILER[Parent Retailer AB Number],
        "@Sales", [Enterprise total Sales]
    ),
   [@Sales] >= ActualSales
) 
VAR Calc1 = SUMX( VT,[@Sales] ) 
VAR Result = DIVIDE( Calc1, SelectedSales, 0 ) 

Return 
Result
  1. If you change logic in measure, the change should also be applied to the table.

This reference column: [CRM_PARENT_GROWER_MASTER_KEY]
has been replaced with this one: RETAILER[Parent Retailer AB Number]

After smal adjustment, the measure is working, but again, it is tooooo slow (unusable).
Maybe the approach to solve the problem should be different.

image

Take care,
Mariusz

Thanks @mno but I need this table to have individual growers in that table, not retailers. When you changed the formula to Retailer, it gives me data at the Retailer level. I need it at the grower level.

So I need it to sum those grower numbers (percentages).

Thanks,
Ritesh

Ritesh,

In that case, the first formula is to be used.

Pareto Sales =
VAR ActualSales = [Enterprise total Sales] 
VAR SelectedSales = CALCULATE(
   [Enterprise total Sales],
    ALLSELECTED( GROWER[CRM_PARENT_GROWER_MASTER_KEY] )
) 
VAR VT = FILTER(
    SUMMARIZE(
        ALLSELECTED( GROWER[CRM_PARENT_GROWER_MASTER_KEY] ),
        GROWER[CRM_PARENT_GROWER_MASTER_KEY],
        "@Sales", [Enterprise total Sales]
    ),
   [@Sales] >= ActualSales
) 
VAR Calc1 = SUMX( VT,[@Sales] ) 
VAR Result = DIVIDE( Calc1, SelectedSales, 0 ) 

Return 
Result

Take care,
Mariusz

Lol. I am in catch 22. The first formula doesn’t return anything for me. It just spins. Is there a no way to add the running sum with the formula variation that I did?

Selected Sales: Grower
VT : Retailer

Pareto Sales 3 = 
VAR ActualSales = [Enterprise total Sales] 
VAR SelectedSales = CALCULATE(
   [Enterprise total Sales],
    ALLSELECTED( GROWER[CRM_PARENT_GROWER_MASTER_KEY])
  
) 
VAR VT = FILTER(
    SUMMARIZE(
        ALLSELECTED( RETAILER[Parent Retailer AB Number] ),
        RETAILER[Parent Retailer AB Number],
               
        "@Sales", [Enterprise total Sales]
    ),
   [@Sales] >= ActualSales
) 
VAR Calc1 = SUMX( VT,[@Sales] ) 
VAR Result = DIVIDE( Calc1, SelectedSales, 0 ) Return Result

Ritesh,

I dare to disagree :slight_smile:

The first measure is working like a charm if you apply some filters.

image

I think, either your model or approach to solve the problem, should be revised.

Good luck!

Mariusz

Ritesh,

They say, “The only easy day was yesterday”.
I have found the solution that is working as a charm.

  1. Simple measure ranking.
  2. Parameter which adjusts TopN based on ranking.
  3. Sales measure that includes only selected customer based on sales ranking.
  4. Simple measure that shows % of sales generated by filters customers.
  5. Once you achieve 80% you stop and that’s your customers.

You have to just play with the slicer. It’s better than just waiting, waiting and waiting :slight_smile:

Ranking measure.

Ranking ETS =
RANKX(
    ALLSELECTED( GROWER[CRM_PARENT_GROWER_MASTER_KEY] ),
   [Enterprise total Sales],,
    DESC
)

TopN Sales measure

TOPN Sales =
VAR TopNSelected = SELECTEDVALUE( TopN_Growers[TopN_Growers] ) 
VAR Result = IF( [Ranking ETS] > TopNSelected, blank(),[Enterprise total Sales] ) 

RETURN
IF(
    ISINSCOPE( GROWER[CRM_PARENT_GROWER_MASTER_KEY] ),
    Result,
    SUMX(
        FILTER(
            ADDCOLUMNS(
                VALUES( GROWER[CRM_PARENT_GROWER_MASTER_KEY] ),
                "@sales", [Enterprise total Sales]
            ),
           [Ranking ETS] <= TopN_Growers[TopN_Growers Value]
        ),
       [@sales]
    )
)

% of sales generated by filtered customers

TOPN Sales % =
DIVIDE( [TopN Sales],[Enterprise total Sales], 0 )

The result: 8570 grower master keys generate 80% of total sales.

Does it make any sense?

Cheers,
Mariusz

Thanks @mno. I really appreciate you spending so much time on this. Its so sincere.

Unfortunately, I proposed similar solution to begin with to the client. Please see this pic. It should be the first tab in the original file that I sent in the beginning. I was also using TopN to reach to the 80%.

They said no. They needed to see static list without their sales people doing sliders and reaching to 80%.

Ritesh,

I was thinking about your case, trying power query and DAX and nothing is working as expected.
However, I’m only PBI amator, so I believe PBI experts will help you out.

I’m curious how your challenge could be solved.
If you come up with the solution and if you don’t mind, could please share?

Have a good weekend.
Mariusz

@mno So my boss came to rescue. He created pretty powerful and elegant SQL with some subqueries and partition by clauses. I am using that as standalone table. It works perfect for this situation and it’s pretty fast. Here it is to just give an idea. Thanks for all of your help.

SELECT
  a.crop_year_key,
  a.parent_retailer_key,
 -- a.parent_retailer_ab_name,  
  a.crm_parent_grower_master_key,
  a.grower_reported_name,
  a.grower_physical_city,
  a.grower_physical_state,    
  a.retailer_grower_sales_amount,
  a.total_sales_amount,
  a.total_sales_amount_pct,
  a.running_pct
FROM
  (
 
SELECT
  wfas.crop_year_key,
  wfas.parent_retailer_key,
  wfas.crm_parent_grower_master_key,
  wfas.grower_reported_name,
  wfas.grower_physical_city,
  wfas.grower_physical_state,    
  wfas.retailer_grower_sales_amount,
  wfas.total_sales_amount,
  ((wfas.retailer_grower_sales_amount / wfas.total_sales_amount) * 100) total_sales_amount_pct,
  sum(((wfas.retailer_grower_sales_amount / wfas.total_sales_amount) * 100)) over (partition by parent_retailer_key 
  order by ((wfas.retailer_grower_sales_amount / wfas.total_sales_amount) * 100) desc) as running_pct
FROM (
        SELECT crop_year_key,
               parent_retailer_key,
               wdgm.crm_parent_grower_master_key,
               wdgm.grower_reported_name,
               wdgm.grower_physical_city,
               wdgm.grower_physical_state,                   
               SUM(pos_sales_amount) OVER (PARTITION BY crop_year_key, parent_retailer_key, wdgm.crm_parent_grower_master_key) retailer_grower_sales_amount,
               SUM(pos_sales_amount) OVER (PARTITION BY crop_year_key, parent_retailer_key) total_sales_amount,
               ROW_NUMBER() OVER (PARTITION BY crop_year_key, parent_retailer_key, wdgm.crm_parent_grower_master_key ORDER BY invoice_number) keep_flag
          FROM wf_marts.wfs_f_agrimine_pos pos
          INNER JOIN wf_marts.wfs_d_grower_master_v wdgm ON pos.grower_master_key = wdgm.grower_master_key
          INNER JOIN wf_marts.wfs_d_product wdp on pos.product_key = wdp.item_number_id
         WHERE crop_year_key in( 2019,2020) and wdp.business_segment_name = 'CROP PROTECTION' and parent_retailer_key <> 0 -- and parent_retailer_key IN ( 7023923, 7025806)
       ) wfas
WHERE
  wfas.keep_flag = 1 
 
ORDER BY
  wfas.parent_retailer_key,  ROUND((wfas.retailer_grower_sales_amount / wfas.total_sales_amount) * 100) DESC
  ) a
WHERE
  a.running_pct < =100 
       ;