Identify non duplicates, to remove, and then calculate time spend on two steps, and average as well

Hello,
I am hoping to be able to automate this data metrics and calculations. I need to identify non duplicates - and remove them from the list or just ignore in the calculations. Also, in the cases where there are more than one Step (like step B in the example), the more recent date should be kept, the other one deleted. Then every Overall Process should have Step A and Step B. As in example - only the ones in Green should be kept for calculations.
In excel i would just do custom sort (1st level Overall process(asc), 2nd - Step (desc) and then use the formula for the average time to calculate = (B Complete time on - A Created on)*24.

If i anyone has done anything similar or would have any advice it would be most appreciated.

Thanks a million in advance. Example.xlsx (9.7 KB)

@Jetuke_ES,

Really interesting problem. I think I’ve got a solution that filters your records down automatically only to the ones you want.

In Power Query:

First, created a “Duplicate Key”, concatenating the first four fields

Then created the following grouping:

After expanding, created this grouping:

Then, in DAX created the following “Keep Record” calculated column, working logic off of the two grouping columns created above in Power Query, and then filtered the data table only to keep the records where “Keep Record” = “Yes”:

Hope this is helpful.

Full solution file attached below.

Hi @Jetuke_ES,
.
Based on what you wrote here:

.
This doesn’t seem right, please check that, thanks.

.
So first I Grouped the Rows by: “Status”, “Overrall Process (short)” and “Overall Process”
That allowed me to identify records that were only listed once and remove them. Next I sorted the data descending, extracted it again and finally Removed Duplicate ( “Status”, “Overrall Process (short)”, “Overall Process”, “Step”)

.
I hope this is helpful.
eDNA - Identify data sets to keep.pbix (26.4 KB)

@Melissa,

Wow! Same approach, 30 seconds apart. It’s almost as if we’re all stuck inside, trying to keep ourselves occupied with interesting forum posts…

  • Brian
1 Like

@BrianJ

Yeah all this “social distancing” makes me want to “Group By” and there is only one place where that’s safe and that’s inside Power Query!

1 Like

Melissa, you are right, i highlighted the wrong row!! :slight_smile: I will have a look at the solution, amazing to see that it is actually possible (probably). How could i then minus the completed on minus created on to get the time taken for the Process? And then Calculate the average as a whole?

Stay well, stay safe.

erika

That’s where “Duration” comes in… I’ll update my PBIX and attach it here in a moment.

Hey Brian! Brilliant, thanks for looking at this. I was also asking Melissa:
How could i then minus the completed on minus created on to get the time taken for the Process? And then Calculate the average as a whole? Do you have any ideas?

Thank you both so much,
erika

@Jetuke_ES,

How could i then minus the completed on minus created on to get the time taken for the Process? And then Calculate the average as a whole? Do you have any ideas?

I do, but I made the tragic mistake of running upstairs for a snack and a drink. With a 10 minute head start, @Melissa’s solution should be arriving any moment now… :smile:

  • Brian

:slight_smile: True that! Thanks Brian! stay safe :slight_smile: really appreciate your both help.

.
So I hope this is what you were expecting:

.
I identified the Min value for the Created On for both Steps and the Max value for the Completed on, subtracted them and multiplied by 24.

eDNA - Identify data sets to keep.pbix (36.4 KB)

Looks exactly what i need Melissa! you helped me out again. :smile: Can i just ask one last thing please: how did you identify min/max per Overall Process per person? did you use Calculate and Earlier functions?

No nested table funcions inside Power Query… So I wrote a bit of M code, here’s what I did:
Made another Group By table containing AllRows then transformed that nested table by adding 2 columns one for the Min Created On date of that set and one for the Max Completed on date of that set.


.
Extracted all the data, you can see that in the Applied Steps “Inserted Time Duration”

1 Like

@Jetuke_ES,

Here’s my DAX-based solution:

image

Elapsed Days = 

VAR ElapsedSeconds =
CALCULATE(
    DATEDIFF(
        [Min Created On],
        [Max Completed On],
        SECOND
    ),
    FILTER(
        Data,
        Data[Keep Record] = "Yes"
    )
)
VAR SecondsPerDay = 24*60*60

RETURN
DIVIDE(
    ElapsedSeconds,
    SecondsPerDay,
    0
)
        
Average Days Elapsed = 

VAR vTable =
    FILTER(
        Data,
        Data[Keep Record] = "Yes"
    )

VAR vTable2 =
    ADDCOLUMNS(
        SUMMARIZE(
            vTable,
            Data[Overall Process]
        ),
    "Days Elapsed", [Elapsed Days]
    )

RETURN
AVERAGEX(
    vTable2,
    [Days Elapsed]
)

Full solution file below:
eDNA Forum - Process Duplicates Solution.pbix (49.9 KB)

  • Brian
1 Like

That’s great! Thank you both so much!! Ill give it ago, if works with all my data. But all looks like what i need. will ping you if not.

Thanks once again!

Hi @Jetuke_ES, please don’t forget if your question has been answered within the forum it is important to mark your thread as ‘solved’. Thanks!

@Jetuke_ES,

Glad to help – that was fun. FYI – I revised the Elapsed Days measure and the solution file posted. Still returns the same results, but is more robust if you end up putting more information into the table visual.

Just give a shout if you have any questions when you try these out with your data.

  • Brian

Hi both,
all is working out perfectly up until the calculating the A and B step per Person duration to complete. Think i missing something, so sorry to be bothering again. Did you do the new Group By by Step, Overall Process and in Operation you did Max completed on and Min Created on for AllRows? Think im really close just missing something.

Okay that’s sounds like the part with the nested table functions I illustrated yesterday. Please see if you can fix it by examining the M code for the each of the Applied Steps in the provided PBIX.

If you can’t resolve it, post another sample that matches your column headers and structure exactly. Then I’ll set it up for you.

Melissa, that worked out for me! :slight_smile: Could i possibly identify if there are weekend in between the Closed on and Created in between Steps? So the Duration would only cover week days and no weekends?

In excel i was just doing the Networkdays function. In this case would you use Calendar function between the MIN and MAX dates that are used in the duration calculation?

Thank you so much.