Power Query Workout 09 - Extract Part Number

Difficulty Rating:

Introduction:

You receive a report with a column Summary, of which the first portion is a part number.
You notice that these part numbers have various lengths and structures.
Also the portion after the part number has various lengths and structures.

Your task:
Extract the part number and bring it into a separate column.

(NOTE: For more advanced users, attempt to minimize the number of manual changes using relevant M code functions)

Power Query Workout9 Extract Part Number

9 Extract Part Number.xlsx (16.4 KB)

This workout was posted on Monday May 22, 2023, and the author’s solution will be posted on Sunday May 28, 2023.

6 Likes
let
    Fonte = Data,
    add = Table.AddColumn(Fonte, "Personalizar", each Text.Split([Summary]," ")),
    exp = Table.ExpandListColumn(add, "Personalizar"),
    ad = Table.AddColumn(exp, "Personalizar.1", each if Text.Select([Personalizar],{"0".."9"}) =  "" then null else Text.Select([Personalizar],{"0".."9"})),
    rs = Table.Group(ad, {"Summary"}, {{"Part Number", each Text.Combine(Table.SelectRows(Table.FillUp(_,{"Personalizar.1"}), each [Personalizar.1] <> null)[Personalizar]," ")}})[[Part Number]]
in
    rs
2 Likes

Adjunto mi query…

let
Source = Excel.CurrentWorkbook(){[Name=“Data”]}[Content],
Sol = Table.AddColumn(Source, “Part Number”, each Text.Combine(
let
a = Text.Split([Summary], " "),
b = Table.FromColumns({a}),
c = Table.AddColumn(b, “New”, each Text.Select([Column1], {“0”…“9”})),
d = Table.TransformColumns(c, {“New”, each if _ = “” then null else _}),
e = Table.SelectRows(Table.FillUp(d, {“New”}), each [New] <> null)
in e[Column1], " "))
in
Sol

1 Like

@Matthias ,

This solves as a one-liner by executing a Regex called from an R script from within Power Query, and then just expanding the result.

Probably the equivalent of using a sledgehammer to swat a fly, but fun nonetheless…

  • Brian

let
  Source = DataRaw, 
  RegexR = Table.SelectColumns(
    R.Execute(
      "library(stringr)#(lf)df <- 
          data.frame(dataset, stringsAsFactors = FALSE)
          #(lf)df$PartNumber <- sapply(str_extract_all(df$Summary, "".*\\d""), paste, collapse="", "")", 
          [dataset = Source]
    ), 
    "Value"
  ), 
  Expand = Table.ExpandTableColumn(
    RegexR, 
    "Value", 
    {"Summary", "PartNumber"}, 
    {"Summary", "PartNumber"}
  )
in
  Expand

Brian

1 Like

@Matthias Here is the simple one line solution

let
Source = Data,
Result = Table.AddColumn(Source, “Part Number”, each Splitter.SplitTextByCharacterTransition({“0”…“9”}, {" "})([Summary]){0})
in
Result

4 Likes

For a quick one-liner one could use Text.TrimEnd:

let
  Source = 
    Excel.CurrentWorkbook(){[Name = "Data"]}[Content], 
  Text_TrimEnd = 
    Table.AddColumn(
      Source, 
      "PartNumber", 
      each Text.TrimEnd([Summary], {"A" .. "Z", "a" .. "z", " "}), 
      type text
    )
in
  Text_TrimEnd```


![Text.Trimstart|477x253](upload://4lXJGGHeRdL312ZwA9ZaPzwm1FW.png)
6 Likes

Here with my solution.

let
Source = Excel.CurrentWorkbook(){[Name=“Data”]}[Content],
#“Added Custom” = Table.AddColumn(Source, “Custom”, each Text.BeforeDelimiter([Summary]," ",{0,RelativePosition.FromEnd})),
#“Added Part Number” = Table.AddColumn(#“Added Custom”, “Part Number”, each Text.Remove([Custom],{“a”…“z”})),
#“Removed Columns” = Table.RemoveColumns(#“Added Part Number”,{“Custom”}),
#“Changed Type” = Table.TransformColumnTypes(#“Removed Columns”,{{“Summary”, type text}, {“Part Number”, type text}})
in
#“Changed Type”

Workout 009

1 Like
let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    Ans1 = Table.AddColumn(Source, "Part Number", each Text.Replace([Summary],List.Last(Text.SplitAny([Summary],"0123456789")),"")),
    Ans2 = Table.AddColumn(Source, "Part Number", each Text.BeforeDelimiter([Summary],List.Last(Text.SplitAny([Summary],"0123456789")))),
    Ans3 = Table.AddColumn(Source, "Part Number", each Text.Start([Summary],List.PositionOfAny(Text.ToList([Summary]),{"0".."9"},1)+1))
in
    Ans3
2 Likes

Impresionante, sencillo y elegante.!!! Jamás se me habría ocurrido…

2 Likes

Here is my solution to the workout

1 Like

Hello! Here is my solution.
I love how these workouts oblige me to deepen my knowledges! Thanks!
Kind regards
Valeria

let
Source = ExcelW9,
#“Added Custom” = Table.AddColumn(Source, “Part Number”, each Text.Start([Summary],Text.PositionOfAny(([Summary]),{“0”…“9”},Occurrence.Last)+1),type text)
in
#“Added Custom”

3 Likes

Proposal - M-Code in 1 step to get Left-part of each value until last Numeric character

Add_Col = Table.AddColumn( Source, "Part Number", each  
    Text.Start( [Summary],                       // Left until Pos of Last NumChar
		1 + Text.PositionOfAny([Summary], {"0".."9"}, Occurrence.Last)) )
1 Like

Hi Valeria : reviewing others proposals : saw we get same approach in that case
Love to take a quick look afterwards, searching for simplier or different perspective
Always a good way of learning for me.

1 Like

Otra forma de conseguir la solución:

let
Source = Excel.CurrentWorkbook(){[Name=“Data”]}[Content],
Sol = Table.SplitColumn(Source, “Summary”, Splitter.SplitTextByCharacterTransition({“0”…“9”}, {" "}), {“Part Number”, “Summary.2”})[[Part Number]]
in
Sol

1 Like

Hello Eric! Yes and as one of the 3 proposals by borydobon! :slight_smile:
I do the same: after I complete the workout, I look and try to understand others’ solutions to learn!

1 Like

Hello,
thanks to everyone who participated! “Jamás se me habría ocurrido…” or “I look and try to understand others’ solutions to learn” are pretty good quotes in a workout thread. Love it!

I am not trying to add a better solution, just two variants plus one alternative approach.

I very much like the approach with Splitter.SplitTextByCharacterTransition({“0”…“9”}, {" "} but it is not doable just by clicking. The task was to create a separate column with Part Numbers, so we can start off by a right mouse on Summary and selecting Duplicate Column:
image

Then select on the Split Column dropdown By Digit to Non-Digit:
image
In the formula bar then replace
= Table.SplitColumn(#“Duplicated Column”, “Summary - Copy”, Splitter.SplitTextByCharacterTransition({“0”…“9”}, (c) => not List.Contains({“0”…“9”}, c)), {“Summary - Copy.1”, “Summary - Copy.2”, “Summary - Copy.3”, “Summary - Copy.4”, “Summary - Copy.5”})
with
= Table.SplitColumn(#“Duplicated Column”, “Summary - Copy”, Splitter.SplitTextByCharacterTransition({“0”…“9”}, {" "}), {“Part Number”} )

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Duplicated Column" = Table.DuplicateColumn(Source, "Summary", "Summary - Copy"),
    #"Split Column by Character Transition" = Table.SplitColumn(#"Duplicated Column", "Summary - Copy", Splitter.SplitTextByCharacterTransition({"0".."9"}, {" "}), {"Part Number"} )
in
    #"Split Column by Character Transition"

:warning: @Rajesh has a nice one-liner with Splitter.SplitTextByCharacterTransition({“0”…“9”}, {" "}

@lillie72 had an interesting approach with removing {“a”…“z”}. You could select on the Add Column tab Custom Column:
image
Call the New Column Name “Part Number” and add this formula:
= Text.Remove([Summary],{“a”…“z”})

That’s good for the given data, but we do have now trailing blanks which we can remove by Trim. Right mouse Part Number and select Transform > Trim
image

Done:

let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Added Part Number" = Table.AddColumn(Source, "Part Number", each Text.Remove([Summary],{"a".."z"})),
    #"Trimmed Text" = Table.TransformColumns(#"Added Part Number",{{"Part Number", Text.Trim, type text}})
in
    #"Trimmed Text"

Talking about Trim, check out the slim Text.TrimEnd solution from @RickdeGroot. It has the benefit that it works also if the portion after the part number does include capital letters!

Finally let me add one general alternative if you have more than one criterion for extracting or splitting.
Add a Custom Column as above and add an if statement checking which case is contained in the Summary column. In our case if the string starts with a “number” we can extract the text before the first blank and otherwise the text before the second blank:
= Table.AddColumn(Source, “Part Number”, each if List.Contains({“0”…“9”}, Text.Start([Summary],1) ) then Text.BeforeDelimiter([Summary], " ") else Text.BeforeDelimiter([Summary], " ", 1))

With the given data you have other alternatives, but this approach with different cases can be helpful for complex extraction or splitting tasks.

Try it out and make sure to check the other approaches above! => Hope you learn something!

Big thanks to everyone who has blurred or hidden the details! I have taken this now out so that it’s easier to compare solutions.

6 Likes