@JazZ0003,
I certainly agree with @piryani that moving forward you will need to develop clear business rules on nomenclature. However, I interpreted the problem as being that you are starting with a lot of inconsistent “dirty” data and looking for how to standardize that. If that’s the challenge, then Power Query definitely seems to me to be the way to go with this. Taking your example data, here’s how I cleaned it:
- took the original data
and transposed it so that you are working in columns, rather than rows
- for the first column use the Split transformation to break the data into three components (first number(A), x and second number (B). Then created the following custom function to place the smaller number first in the concatenation
-
used simple split transformations to extract only the relevant piece for columns 2 and 3
-
re-transposed the cleaned data back to original form
Here’s the full M code for all the transformations:
let
Source = Excel.Workbook(File.Contents("C:\Users\brjul\Desktop\Company Data.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(Sheet1_Sheet,{{"Column1", type text}, {"Column2", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"Company A", type text}, {"Company B", type text}}),
#"Transposed Table" = Table.Transpose(#"Changed Type1"),
#"Duplicated Column" = Table.DuplicateColumn(#"Transposed Table", "Column1", "Column1 - Copy"),
#"Split Column by Character Transition" = Table.SplitColumn(#"Duplicated Column", "Column1 - Copy", Splitter.SplitTextByCharacterTransition({"0".."9"}, (c) => not List.Contains({"0".."9"}, c)), {"Column1 - Copy.1", "Column1 - Copy.2"}),
#"Split Column by Character Transition1" = Table.SplitColumn(#"Split Column by Character Transition", "Column1 - Copy.2", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), {"Column1 - Copy.2.1", "Column1 - Copy.2.2"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Character Transition1",{{"Column1 - Copy.1", Int64.Type}, {"Column1 - Copy.2.2", Int64.Type}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type2",{{"Column1 - Copy.1", "A"}, {"Column1 - Copy.2.1", "x"}, {"Column1 - Copy.2.2", "B"}}),
#"Added Custom" = Table.AddColumn(#"Renamed Columns", "Custom", each if [A] <= [B] then Number.ToText([A])&[x]&Number.ToText([B]) else Number.ToText([B])&[x]&Number.ToText([A])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Column1", "A", "x", "B"}),
#"Renamed Columns1" = Table.RenameColumns(#"Removed Columns",{{"Custom", "Column 1"}}),
#"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns1",{"Column 1", "Column2", "Column3"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Reordered Columns", "Column2", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Column2.1", "Column2.2"}),
#"Changed Type3" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column2.1", type text}, {"Column2.2", Int64.Type}}),
#"Trimmed Text" = Table.TransformColumns(#"Changed Type3",{{"Column2.1", Text.Trim, type text}}),
#"Removed Columns1" = Table.RemoveColumns(#"Trimmed Text",{"Column2.2"}),
#"Renamed Columns2" = Table.RenameColumns(#"Removed Columns1",{{"Column2.1", "Column 2"}}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Renamed Columns2", "Column3", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Column3.1", "Column3.2"}),
#"Changed Type4" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column3.1", type text}, {"Column3.2", Int64.Type}}),
#"Trimmed Text1" = Table.TransformColumns(#"Changed Type4",{{"Column3.1", Text.Trim, type text}}),
#"Removed Columns2" = Table.RemoveColumns(#"Trimmed Text1",{"Column3.2"}),
#"Renamed Columns3" = Table.RenameColumns(#"Removed Columns2",{{"Column3.1", "Column3"}}),
#"Transposed Table1" = Table.Transpose(#"Renamed Columns3"),
#"Renamed Columns4" = Table.RenameColumns(#"Transposed Table1",{{"Column1", "Company 1"}, {"Column2", "Company 2"}})
in
#"Renamed Columns4"
I hope this is helpful. Full solution file and original data file posted below, so you can take a look at the detailed applied steps in Power Query if you want to.
In terms of turning the result into a usable fact table, you may want to unpivot it in the following way, Into the “narrow and long” shape well-suited to DAX:
Company Data.xlsx (8.6 KB) eDNA -Standardizing Company Data.pbix (23.3 KB)
Edit: looking at the two responses above that came in as I was putting mine together, I realize I may have completely missed the boat on this one – misinterpreting the question, and providing a detailed but highly irrelevant answer. If that’s the case – sorry, my bad…