Latest Enterprise DNA Initiatives

Removing duplicates in a large table

Hello Everyone

I have a table containing 3 million or so transaction records.

I need to find for the unique customer ID’s in this table and generate another table containing all the unique customer ID’s. I need this list to filter out some other data

So, what I’m doing now is duplicating this table, removing all the columns except the Customer ID column and then doing a remove duplicates in Power Query. That does not work for me Power BI never comes back and just spins endlessly with getting the results I wanted. Meaning a table (query) with just my unique Customer ID’s from my 3 million + transaction table.

There must be a better way to get all my unique Customer ID’s from a 3 million + table.

What should I do ?

Thanks

Eric (Montreal, Canada)

1 Like

Hi ,

You can create a cal table where you can use function Values and it will do it for you.
Attaching documentation link so you can understand that function

Thanks,
Anurag

2 Likes

Hi @ericet I think what @Anurag suggested is one solution.

If you have a table and you duplicate it, you will potentially be loading the 3m rows AND going through the all the steps in your original query.

If you wanted to complete this in Power Query then perhaps use some M code to pull in your distinct list.

List.Distinct - PowerQuery M | Microsoft Docs

So I got it to work, use List.Distinct as I did on one of my tables, it creates a list., then I converted it to a table.

Open a blank query and type in = List.Distinct(YOURTABLENAMEHERE[YOURCOLUMNNAMEHERE])

This is how my example below looked when complete

let
Source = List.Distinct(ACCOUNT_SUFFIX[#“Account Name”]),
#“Converted to Table” = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error)
in
#“Converted to Table”

David

4 Likes

@ericet Try to this operation in the database itself if that option is available. Either with a view or a table.

1 Like

Of course, this would be the first port of call @AntrikshSharma …I think I have Friday brain today, it’s been a loooooooooooooooooong week :exploding_head: :rofl: :brain:

1 Like

Thanks everyone, I knew there was a better way to build a unique set of values. That’s way I asked the question.

But … I’m doing something wrong.

= List.Distinct(“Fait Paie Unite Prod”[“Id Unite Production”])

1

1 Like

Hey @ericet when you have spaces in your table and/or column name you need to have it like below

=List.Distinct(#“Fait Paie Unite Prod”[#“Id Unite Production”])

With a # and then "

3 Likes

Happy Monday/Lundi @ericet

Did you manage to get my suggestion working?

David

Yes !! This is an important trick. It’s the most efficient way to generate a distinct list of values. I was able to do it before but this trick beats all the useless method I was doing before.

Thank you DavieJoe and others on this forum for your expertise.

Have a great week

Eric (Montreal, Canada)

2 Likes

No worries @ericet more than happy to help. My M Code knowledge isn’t particularly extensive but lucky for you the issue you were having was something I’d stumbled upon myself not long beforehand.