Creating a report to list possible duplicates

Hi

I’ve been asked if I can create a report to list possible duplicates for checking. The database has over 70,000 records.

My thoughts is having filter options the user can turn on or off like surnames have to match and/or first names.

Other filters might be email match or address match etc.

When the filters are selected I envisage a list appearing determined by the filters selected.

For example, you might want to check the data using first name, surname and email match. You might then get:

123456 A B Sample absample@gmail.com
789012 A B Sample absample@gmail.com

Any help getting started with this would be much appreciated.

Hello @KieftyKids,

Thank You for posting your query onto the Forum.

I’m providing some of the links of the videos, blog posts as well as the solution that was provided on the Power BI Community Forum for the similar sort of the query. About how this can be achieved via Power Query as well through DAX.

Hoping you find this useful and helps you in your analysis. :slightly_smiling_face:

Thanks & Warm Regards,
Harsh

Hi @KieftyKids, did the response provided by @Harsh help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark the thread as solved. Thanks!

Thanks for the links.

I’ve created some calculated columns to determine whether a field is a duplicate or not. I’ve checked the data and it is as it should be.

However, when I applied both filters to my table on the visualisation, they’re not doing as I expected. I thought my list would show based on the two filters being applied.

The email filter is working fine but the first name isn’t. That is, I thought the list would only include those whose first name and email match.

image

image

Would this have to do with the data modelling?

image

Hi @KieftyKids,

Have you explored Fuzzy matching?

https://www.sumproduct.com/blog/article/power-query-blogs/power-query-fuzzy-matching

.

If you need assistance please also provide the desired outcome and a sample data set.
Thanks!