Counting and detecting duplicates or repeats

Hello,

I have a number of patients that have received testing. I’m trying to determine how many of those patients have been tested more than once. I have a unique key for each result but no unique ID for each patient. As a result, I am trying to count how many potential repeat patients there are based on values like phone number, dob, first name, and last name.

The outcome I am hoping for is something like if dob, first_name, last_name & phone match than count how many times that patient has come.

I’m having a hard time trying to write this in Dax.

I tried using something like this below but Earlier is not working for me at all:

Occurence =
COUNTX (
FILTER ( yourTable, EARLIER ( yourTable[Col A] ) = yourTable[Col A] ),
yourTable[Col A]
)

Any help or thoughts would be much appreciated.

@matthew.wright,

A few thoughts:

  1. EARLIER is a beast to get to work in a measure. Try using it in a calculated column instead.
  2. Set your calculated column up to return a 1 if a match with previous, 0 otherwise . Then you can sums, counts, etc. easily via measures on that binary calculated column.
  3. @Greg’s DAX Pattern section on Previous Row Value has a wealth of information on how to do this.

I’ve got to run to a meeting now, but if you can provide a sample dataset, happy to work with you on the specific DAX.

I hope this is helpful.

  • Brian

@BrianJ,

I’ll try a calculated column after lunch and report back. I was hoping to be able to do it in a DAX measure instead.

Could you point me to the articles @Greg has on this? I would love to explore what he has.

I’ll try to produce a sample file with a small set of data.

@matthew.wright

Here you go

@matthew.wright

EARLIER allows you to access previous row context by specifying the number of row contexts to go back, a row context for calculated column is available when you click “new column”.

If you were to create a calculated column there would be 2 row context, 1 created by the calculated column on the existing table and the second created by FILTER. that’s how EARLIER is able to access the previous row context from the calculated column

By being able to access the value of the currently iterated row outside the code of calculated column (remember every cell is iterated in a calculated column because of the row context) you are able to filter the table inside FILTER with the help of EARLIER, where the value of the currently iterated row inside FILTER matches with the currently iterated row by the calculated column.

In your measure the first and only row context is coming from FILTER and not from outside it, your measure is basically saying COUNTROWS ( FILTER ( Table, 1 = 1 ) ), hence EARLIER can’t be used in this scenario

Replace EARLIER with SELECTEDVALUE ( yourTable[Col A] )

Also, I would suggest you stop using EARLIER and start using Variables, as in cases of nested row context EARLIER/EARLIEST are way too confusing.

2 Likes

@AntrikshSharma Is your book “The Definitely Definite Guide to Dax” available no on Amazon?

2 Likes

@MudassirAli I was watching FRIENDS but your comment made me crack up more than the entire show has ever done! Probably in 4-5 years when I can say with confidence that now I know DAX, lol :stuck_out_tongue:

1 Like

@AntrikshSharma
I am glad I was able to help :joy: :rofl:

1 Like

Hi @matthew.wright.

I haven’t tried this, but perhaps you could create a calculated column on your [Tests] table that generates such a patient key, something like:

Patient Key =
COMBINEVALUES( "-", RELATED( Patients[DOB] ), RELATED( Patients[First Name] ), RELATED( Patients[Last Name] ), RELATED( Patients[Phone] ) )

then count the occurrences?

Hope this helps.
Greg

1 Like

@matthew.wright,

I remembered that @Harsh recently put together a great list of resources and approaches for handling this problem.

Creating a report to list possible duplicates - #3 by Harsh

Along the same lines, @Greg’s post got me thinking - there’s likely a 90 second solution to this problem- in PQ do a GROUPBY on the fields that together define a duplicate and set the resulting column to count. Anything with a count > 1 is a duplicate.

  • Brian
1 Like

Hi @matthew.wright, did the response provided by the experts help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark the thread as solved. Thanks!

Hi @matthew.wright, a response on this post has been tagged as “Solution”. If you have a follow question or concern related to this topic, please remove the Solution tag first by clicking the three dots beside Reply and then untick the check box. Also, we’ve recently launched the Enterprise DNA Forum User Experience Survey, please feel free to answer it and give your insights on how we can further improve the Support forum. Thanks!