# Extract the first number(s) from a text string using a calculated column

Hi,

For my data set, I need to do a breakdown of sales by the model size. Each model or tonnage size is represented by a 2 digit number.

Example:
09 = 0.75 Ton
12= 1 Ton
15=1.25 Ton
18= 1.5 Ton
24= 2 Ton
30=2.5 Ton

The 2 digit number is contained within the product nomenclature string
Example:VPB09A1DEF31HFL
However, it is not always in the same location. Therefore, I cannot just do a simple “LEFT” or "MID function. I need formula that can find the first instance of a number in the product nomenclature string, then return that value, along with the digit to the right of it.

I have figured out a way to do this using excel formula. See attached file:test.csv (28.2 KB)

By the way, I received guidance on how to do this from this website, in case you need to understand the method behind the madness: https://exceljet.net/formula/split-text-and-numbers

I have tried to use this same formula in column H from the Excel spreadsheet in PowerBI as a calculated column and it will not let me. I receive an error message:

Does anyone know how I can do this using DAX? Is a calculated column the way to go? I was thinking if I can successfully create a column that determines the model size, I can easily filter and do a breakdown of sales by each model/tonnage size.

Thanks!
Matt

Definitely recommend doing this in Power Query. The Split Column function makes this task a breeze.

I did it all in a few steps from the user interface, but here’s the full M code. I’ve also attached my solution file so you can look at the Power Query steps.

``````let
Source = Csv.Document(File.Contents("C:\Users\brjul\Downloads\test.csv"),[Delimiter="	", Columns=8, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Ship Amount", Currency.Type}, {"VP-Series Revenue", Currency.Type}, {"Ship Date", type datetime}, {"Product Nomenclature", type text}, {"Specific Product Type", type text}, {"First number location", Int64.Type}, {"Tonnage Size", Int64.Type}, {"Tonnage Size (combined formula)", Int64.Type}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Changed Type", "Product Nomenclature", "Product Nomenclature - Copy"),
#"Split Column by Character Transition" = Table.SplitColumn(#"Duplicated Column", "Product Nomenclature - Copy", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), {"Product Nomenclature - Copy.1", "Product Nomenclature - Copy.2", "Product Nomenclature - Copy.3", "Product Nomenclature - Copy.4", "Product Nomenclature - Copy.5"}),
#"Split Column by Character Transition1" = Table.SplitColumn(#"Split Column by Character Transition", "Product Nomenclature - Copy.2", Splitter.SplitTextByCharacterTransition({"0".."9"}, (c) => not List.Contains({"0".."9"}, c)), {"Product Nomenclature - Copy.2.1", "Product Nomenclature - Copy.2.2"}),
#"Removed Columns" = Table.RemoveColumns(#"Split Column by Character Transition1",{"Product Nomenclature - Copy.1", "Product Nomenclature - Copy.2.2", "Product Nomenclature - Copy.3", "Product Nomenclature - Copy.4", "Product Nomenclature - Copy.5"})
in
#"Removed Columns"
``````

Hope this is helpful.

1 Like

Ah that worked great! Thank you so much @BrianJ
I’ll have to remember to use the PowerQuery more often. It’s a powerful tool

Great – glad that worked for you.

I’m the same way. My first instinct is always to hit every problem with a bunch of DAX code. However, watching the Power Query wizardry of @Melissa, @Nick_M, @Paul and others on the forum has inspired me to take a step back and see if the problem can be addressed by PQ, which often has the added benefit of speeding up the report once the data is loaded, compared to a DAX-based solution.

• Brian