Distinct Count Most Recent Data by Employee for Selected Period

MalcolmJ · July 15, 2021, 9:40pm

Distinct Count Most Recent Data by Employee for Selected Period

The issue: When trying to apply a distinct count to employee history I am receiving accurate totals but the counts by a particular dimension are off (Location, Title, or Department). I have created a simplified data model to illustrate the issue.

What do I mean?
There are only 2 employees at this fake company, Bob and Mike. The company is frequently giving promotions, moving employees around departments, and geographical locations.

We need to be able to dynamically establish headcounts for the various dimensions (Location, Title, or Department) for any given time range selected. For example in the PBIX we have selected Q1 and Q2 for 2021. Part way into 2021, Bob was promoted from an Intern to an Associate and Mike moved from Dalls to Reno. I have given a preview of the transactional data in the upper left table.

Now if we reference the smaller matrixes and tables we can see the employee counts by the various dimensions. In many of the dimensions it appears we 3 employees!

In an attempt to solve the issue myself I created a calculated column in the Employee History table which ranks the transactional data by employee. This appears to be working as the transactions are ranked properly.

Rank =
VAR TkinitRank = ‘Employee History’[Employee ID]
Return
RANKX(
FILTER( ‘Employee History’ , ‘Employee History’[Employee ID] = TkinitRank ) ,
‘Employee History’[PeriodEndDate] ,
‘Employee History’[PeriodEndDate] , DESC , Dense )

My next thought was using this rank to limit the Headcount measure (using DISCTINCTCOUNT) to only look at the most recent transactional data for each employee to determine the count.

This did not work, as can be seen from the snip above.

Ideal Solution:
One measure which can be used to determine the headcount for any dimension based on the date range selected in the slicer. The measure would use the most recent data for each employee for that period. By the slicer selection in the snip, no one would show in Dallas (Mike moved to Reno) and there would be no intern (Bob was promoted).

Basically I would like the distinct count to only pay attention to the row corresponding to the last PeriodEndDate (dynamically filtered by the date slicer) by employee.

Any assistance would be greatly appreciated.

Thanks
Malcolm

PBIX is here:
Distinct Count Most Recent Data by Employee for Selected Period.pbix (82.9 KB)

BrianJ · July 18, 2021, 6:34pm

@MalcolmJ ,

Interesting problem, which is not at all uncommon for Slowly Changing Dimensions like the ones you have here.

I took a different tack in my approach, but there are a lot of different ways to handle this issue. Here are the steps I took:

Created a disconnected date table consisting of just one column drawn from Dates[Date]
Created a slicer based on that disconnected date field (you can use any structure slicer you want, I did a “before” slider just to make testing easier.
Created a measure to harvest the max date from that slider

Developed the following measure to calculate the max period end date for each employee:

Max Date by Employee =

 CALCULATE(
     MAX( 'Employee History'[PeriodEndDate] ),
     FILTER(
         ALL( 'Employee History' ),
         'Employee History'[PeriodEndDate] <= [Harvest Disconn Date]
     ),
     ALLEXCEPT(
         'Employee History',
         'Employee History'[Employee ID]
     )
 )

Created a second measure that equals 1 if the period date equals the max date for each employee

Is Max Date =

IF(
    SELECTEDVALUE( 'Employee History'[PeriodEndDate] ) = [Max Date by Employee],
    1,
    0
)

Modified the headcount measure to filter the employee table by the measure above:

Headcount IsMaxDate =

CALCULATE(
    COUNTROWS(
    FILTER(
        'Employee History',
        [Is Max Date] = 1
    )
)
 )

And presto! Proper headcount, fully dynamic:

I hope this is helpful. Full solution file attached below.

Brian
eDNA Forum -Distinct Count Most Recent Data by Employee for Selected Period Solution.pbix (93.8 KB)

P.S. Fun problem. Great job by you in providing a clear explanation of the issue, and all the relevant information including PBIX. Makes providing good support much easier.

MalcolmJ · July 19, 2021, 5:44pm

@BrianJ big thanks!

I have previewed the PBIX you’ve provided and this appears to be a GREAT solution. I will attempt to apply to the real data model and let you know if I run into any issues.

Thanks again!

MalcolmJ · July 20, 2021, 6:24pm

Update: Have moved over to the real data model and it is working VERY well.

Thanks again @BrianJ

Malcolm

BrianJ · July 20, 2021, 6:59pm

@MalcolmJ ,

That’s great - thanks for the update. Glad to hear the solution is working well for you.

Brian

Distinct Count Most Recent Data by Employee for Selected Period

Rank = VAR TkinitRank = ‘Employee History’[Employee ID] Return RANKX( FILTER( ‘Employee History’ , ‘Employee History’[Employee ID] = TkinitRank ) , ‘Employee History’[PeriodEndDate] , ‘Employee History’[PeriodEndDate] , DESC , Dense )

Rank =
VAR TkinitRank = ‘Employee History’[Employee ID]
Return
RANKX(
FILTER( ‘Employee History’ , ‘Employee History’[Employee ID] = TkinitRank ) ,
‘Employee History’[PeriodEndDate] ,
‘Employee History’[PeriodEndDate] , DESC , Dense )