Extracting the Title

Hey Guys,
So here is the data I have along with the desired output

Title Distributor Company SKU Desired out put
kn29842_The_Unearthin_25648 xyz x896x kn29842 The Unearthin
knEAC03CF0_A_Wiseguy_Christmas_25 xyz x896x knEAC03CF0 A Wiseguy Christmas
kn1868_Listen_2569846823 xyz x896x kn1868 Listen
baby shark and more kids songs-bounce_patrol xyz x896x kn1C9B7CD5 baby shark and more kids songs bounce patrol
back to eden xyz x896x kn3BBDC88F back to eden
kn33A70377_passport_to_the_world_yukon_77672 xyz x896x kn33A70377 passport to the world yukon

The Actual list is has several titles that don’t require any cleaning / modification while others do.
I spent quite some time to get the output but failed. -

I’d appreciate your help regarding this.

edn.xlsx (10.4 KB)

Mine’s not perfect but I guess this would do for the present table.

It basically filters out the SKU first, and then via a column split as a second step to remove the last bits and replace the remaining underscores by spaces.
The issue would be if there’s a new variable Title format added to the table, I’m curious to see other ways :slight_smile:

M Code:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Added Conditional Column" = Table.AddColumn(Source, "Output", each if Text.Contains([Title], [SKU]) then Text.AfterDelimiter([Title], "_") else [Title]),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Added Conditional Column", "Output", Splitter.SplitTextByEachDelimiter({"_"}, QuoteStyle.Csv, true), {"Output"}),
    #"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter","_"," ",Replacer.ReplaceText,{"Output"})
in
    #"Replaced Value"
1 Like

@Maikelshuvit put it correctly “The issue would be if there’s a new variable Title format added to the table”.
The approach though does already not cover all given cases. It eats the patrol at the end of the baby shark. :blush:
This variant covers all given cases:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Added Custom" = Table.AddColumn(Source, "Title2", each if Text.StartsWith([Title], [SKU]) then Text.AfterDelimiter([Title], [SKU] & "_") else [Title]),
    #"Split Column by Character Transition" = Table.SplitColumn(#"Added Custom", "Title2", Splitter.SplitTextByCharacterTransition({"_"}, {"0".."9"}), {"Title2"}),
    #"Replaced Value" = Table.ReplaceValue(#"Split Column by Character Transition","_"," ",Replacer.ReplaceText,{"Title2"}),
    #"Trimmed Text" = Table.TransformColumns(#"Replaced Value",{{"Title2", Text.Trim, type text}})
in
    #"Trimmed Text"

Nope, one additional step is necessary for the hyphen. Overlooked that it needs to be replaced as well. :man_shrugging:

2 Likes

I guess the baby shark was a bit hungry :stuck_out_tongue:
Thanks for pointing these out!

Hey Thank you @Matthias & @Maikelshuvit

Okay. So this is working for the data I have. but I think there will be a few scenarios some of them I have checked, where I am not getting the desired output.
We will need to make some changes, but before I share the details with you, to save some time, I need to clarify one thing from the person I got this data from.

I am not marking this as a solution for now, and will share some details hopefully by tomorrow.

Thank you for the code so far. its super clean compared to what I was trying to do. :smiley:

Hey Japjeet, for the first step I was overly cautious to bring in the underscore - that can be taken out. But for the coda I think that underscore plus number is probably not enough. There are many films with numbers in the name. Anyway, for the current data I would go for this:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Added Custom" = Table.AddColumn(Source, "Title2", each if Text.StartsWith([Title], [SKU]) then Text.AfterDelimiter([Title], [SKU] ) else [Title]),
    #"Split Column by Character Transition" = Table.SplitColumn(#"Added Custom", "Title2", Splitter.SplitTextByCharacterTransition({"_"}, {"0".."9"}), {"Title2"}),
    #"Replaced _" = Table.ReplaceValue(#"Split Column by Character Transition","_"," ",Replacer.ReplaceText,{"Title2"}),
    #"Replaced -" = Table.ReplaceValue(#"Replaced _","-"," ",Replacer.ReplaceText,{"Title2"}),
    #"Trimmed Text" = Table.TransformColumns(#"Replaced -",{{"Title2", Text.Trim, type text}})
in
    #"Trimmed Text"

Hi @Matthias

Thank you again for coming back with revised code and it works superclean with the data i had provided. Let me share some more information ( sry I think I should have shared this earlier)

Lets consider the Title :
Kn3548s89 _ Any_Title_It_Can_Be Numbers_Letters_Alphanumeric _123456
Consider it to be in 3 parts

Part1: Kn3548s89 this will always be an SKU and not any other random variable, this SKU needs removing, in some cases as you have seen there may not be any SKU, when there is no SKU, we just remove the “_”

Part2: _ Any_Title_It_Can_Be Numbers_Letters_Alphanumeric this is the Title part, can be anything number, letter, alphanumeric or even any special character in some cases, we just get rid of the “_ ” in the Title Part

Part3: 123456, this is the last part & needs to be removed it will be after the last””, so that’s how we can identify the third part

So, for example, kn259_ charlie_777_12548678
Part1: kn259
Part2: _ charlie_777_
Part3: 12548678
And Output, in this case, should be “Charlie 777”

I have added a few Possible scenarios( last 3 rows), the desired outcome, and “Title2” which is the output of the code we are currently using.

What I was trying, was to figure out that if a title begins with an SKU, I count the number of “", and remove everything after the last "” - Well I haven’t figured out the way to do that yet. :slight_smile:

would that be correct approach ? and how do we get there ?

Again Thank you so much in advance.

[Working Excel file attached]

edn.xlsx (129.5 KB)

Hi Japjeet,
Hyphen is back. :blush:
The way you proceed depends on how do you want to treat kn259_charlie_777.

kn259_charlie_777 has bad luck, because you have not defined how to differentiate a 777 from the number coda you want to get rid of. Same for Rocky 4 and so on if they are not followed by a number coda. You assume that there is always a _ before the number coda:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Added Custom" = Table.AddColumn(Source, "Title2", each if Text.StartsWith([Title], [SKU]) then Text.AfterDelimiter([Title], [SKU] ) else [Title]),
    #"Replaced Value" = Table.ReplaceValue(#"Added Custom",each [Title2],each if Value.Is(Number.From(Text.AfterDelimiter([Title2], "_", {0, RelativePosition.FromEnd})), Int64.Type) then Text.BeforeDelimiter([Title2], "_", {0, RelativePosition.FromEnd}) else [Title2],Replacer.ReplaceValue,{"Title2"}),
    #"Replaced _" = Table.ReplaceValue(#"Replaced Value","_"," ",Replacer.ReplaceText,{"Title2"}),
    #"Trimmed Text" = Table.TransformColumns(#"Replaced _",{{"Title2", Text.Trim, type text}})
in
    #"Trimmed Text"

I :heart: replacing, it is incredible versatile:

1 Like

yes you are correct, in a situation like : kn259_charlie_777 , it will get output as charlie.

This is exactly the scenario that I had discussed with the person who gave me this data. and I am told that any name that starts with a SKU, will always have 3 parts (at least for now :slight_smile: ) so the last number can be safely removed. so technically kn259_charlie_777 should never happen in its current naming format

In the data format ever changes, we will re look at the entire Scenario.

so Thank you.!! @Matthias that solution works

I Appreciate your support and time

Thank you @Maikelshuvit for your inputs

1 Like

and includes _
=> means you can use this code as you have a test for [SKU] in the first line:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Added Custom" = Table.AddColumn(Source, "Title2", each if Text.StartsWith([Title], [SKU]&"_") then Text.AfterDelimiter(Text.BeforeDelimiter([Title], "_", {0, RelativePosition.FromEnd}), [SKU]&"_" ) else [Title]),
    #"Replaced _" = Table.ReplaceValue(#"Added Custom","_"," ",Replacer.ReplaceText,{"Title2"})
in
    #"Replaced _"
2 Likes

This is exactly I am Loving spending time on M Code,
Its brilliant and Nesting of codes makes it so smooth, by reducing the number of steps.

Thank you again. it works perfect.