Data Modeling - Normalization

Hi All,
I am trying to normalize a flat file into dimensions for Customer.
After performing the steps, I have also removed duplicates from the Customer dimension table, but when I try to join the relationship to the Sales fact table, it gives me a warning for Many to Many relationships. I cannot seem to understand what the underlying problem is. I am attaching the file, if any one can point out to the issue here.
Thanks,
Mustafa
Sales 2024 BI.pbix (7.5 MB)

I did a quick view of your Sales Table with your Customer Table. Why do you have all the Customer information in the Sales Table? I would delete all the columns in the Sales Table that are also in the Customer Table except the Customer (number) column. You don’t need in both tables.

Check to make sure that your data types are the same in both tables.
By the looks of it you didn’t delete all the duplicates by the screen print
image
1313 rows 1275 distinct values

You may have ending spaces in the column that you will need to do the trim on it.

I can’t test it because the source gives me an error when i go into the transform data option.

I hope this helps.
Keith

Answer: your customer table doesn’t appear to have any unique ID fields - I added a COUNT on your Customer Table

Row Count = COUNTROWS( Customers )

and then tried this against single columns that appear to be the logical IDs, I did not find any that showed only 1 for the Row Count measure against every value in the column. (Image below)

What column are you trying to join on? You may need a bridge column to make this work (something that combines two fields to create a unique ID). Also, please note @Keith 's comments - if you were not planning to remove the extra columns from the sales table, please rethink that. This will remove some bloat from your final report.

Hi Both Heather & Keith,

I did use the Trim function and removed duplicates and even when I used the Countrows on the Customer table it shows 1 for each customer. I’m not sure why is it still showing as Many to Many when I try the join. Please see the screen shots.
Is there any thing else i’m missing…



you have 1313 rows but there are 1275 distinct values. (as you can see in screen print in my message) You have duplicates some place in your customer tables. What field are you matching within the customer table. You need to match up 1275 distinct values not the number of rows.

I tried matching Customer column(Customer table) to Customer column (Sales Table)

When you did the trimmed text function, was it on Customer column in the Customer table? (i’m drawing a straws here)

Sorry I can’t think of anything else

Hi Keith,
I found out the solution, but think it is strange.
When I removed the duplicates, I selected all the columns in the Customer table and then removed Duplicates, this is the code that Power Query generates: = Table.Distinct(#“Removed Duplicates”)
But when I select only the Customer column in the Customer table and then remove Duplicates, it works and this is the code:
= Table.Distinct(#“Removed Duplicates1”, {“Customer”})
So finally Duplicates are removed with this and I am able to join the Customer table with the Sales table through Customer Code.
And yes, I will be deleting the other irrelevant customer columns in the Sales table except the Customer code column.
But, correct me : isn’t duplicates removed when you select all the columns, that’s my learning today, but Ive seen the guys on Youtube select all the columns when removing duplicates…strange…

Thanks,
Mustafa
Removed Duplicates on all columns
Removed Duplicates on the Customer column only

1 Like

you likely have the same customer at different addresses.

i’m glad my solution help that.

When you remove duplicates from the entire table - it compares ALL of the columns, and if it finds an exact match to every column, then one of the matching rows is removed.

When you remove duplicates from a single column, it only cares about that column - and ignores everything else.

I looked at one of the duplicates from your original data (Customer HA1032) - and the difference is in the Customer Name:

image

As Keith said, different addresses, even different spellings can cause errors.

1 Like

Got it guys…thanks for the advice. :grinning: