Remove duplicates by filterd last session per customer

Hello, everyone.

I get customer data from a database, which are updated per session. Through these updates the customer table can contain duplicates, if for example something changes in the customer data, these changes are written as a new line with the same customer id again in the table. These duplicate customer id’s have to be filtered so that I only get the values with the last updates. So the customer table contains no duplicates and the newest customer master data. Thanks for your help on how to do this in Power QueryTest.pbix (80.2 KB) test_data.xlsx (13.4 KB)

I’m sure there are multiple ways to accomplish this, my method was to do it in Power Query

  1. group the customer_table by customer_id
    this step is easy, with the group by command on the ribbon
    image

  2. Take the group table, and sort it by update_date
    this step is a bit tricky, you have to edit the M code to do it
    (hint: in learning this technique, I find myself building the steps I want on an ungrouped table, and then reviewing the M code to update it - see the section below the similarities between the two lines of code)

    Normal Code = Table.Sort(#“Prior Step”, {{“update_date”, Order.Descending}})
    My Code: = Table.AddColumn(#“Grouped Rows”, “Sorted Data”, each Table.Sort([Data] ,{{“update_date”, Order.Descending}}))

  3. using the same technique as in step 3, I added an index column to the grouped data (with this step complete, you will have three sub-tables in your customer_table

  4. using the same technique, create a new column that filters the sub-tables to index 0 (first number in the index column, which should be the latest data with the sub-tables sorted by update date)

  5. Delete the first three columns (named in the solution file as “Data”, “Sorted Data”, and “Data with Indes”

  6. Expand the “Final Data” column

  7. I moved your column type step down to the end of the steps to properly type all of the columns.

Solution file is attached
NOTE: it is possible to combine the grouping, sorting, indexing steps into a function instead - but this does not seem to be something you are going to need to repeat in multiple locations - so I think solving it for the specific case makes more sense.
I could also have combined steps, but this solution should let you see what is going on more easily.

eDNA solution - filtering table to latest date with grouping.pbix (80.9 KB)

1 Like

@Heather thank you for your quick reply. It is not possible to use the update date in the customer table, I have other tables where I have no update column but the same scenario. The filter value has to come from the session table

Can the session_id be sorted as I did with the update_date?
if so, you simply need to replace the column named in the update step to session_id

if you can’t sort the session_id, then perhaps you can add an index step to the full table (prior to grouping) and then sort based on that index?

I did a merge of the session id and fetched the updated column into the customer table and then sorted it like you did with the update_date and it works great

Glad you were able to make this work for your data set!