Transforming Data by parsing, transposing and pivoting data from a text column

@AntrikshSharma,

And then ABBA is like, “Hey!, remember us?”…
image

@Melissa - thanks very much for the alternate solution. I definitely need to get more comfortable dealing with record types, which is why I think I sometimes have kludgy multi-table workarounds. It will be helpful working through your solution with the focus on that aspect.

  • Brian

This is what I got with THAT :face_vomiting: DAX:

image

Table = 
VAR Temp =
    ADDCOLUMNS (
        ADDCOLUMNS (
            VALUES ( Ibesmond[TEXT] ),
            "@Clean String",
                SUBSTITUTE ( SUBSTITUTE ( SUBSTITUTE ( SUBSTITUTE ( SUBSTITUTE ( SUBSTITUTE ( Ibesmond[TEXT], " ; ", "^" ), ";" , "^"), "$", "^" ), " ", "^" ), "^^", "^" ), "AA/BB", "AA^BB" )
        ),
        "@Word Count",
            1 + LEN ( [@Clean String] ) - LEN ( SUBSTITUTE ( [@Clean String], "^", "" ) )
    )
VAR UpperBound =
    MAXX ( Temp, [@Word Count] )
VAR GenerateSerialNumber =
    SELECTCOLUMNS ( GENERATESERIES ( 1, UpperBound, 1 ), "@Serial Number", [Value] )
VAR TempCrossJoin =
    CROSSJOIN ( Temp, GenerateSerialNumber )
VAR FilterSerialLessThanDelimiterCount =
    FILTER ( TempCrossJoin, [@Serial Number] <= [@Word Count] )
VAR SplitString =
    ADDCOLUMNS (
        FilterSerialLessThanDelimiterCount,
        "@Extracted Child", PATHITEM ( SUBSTITUTE ( [@Clean String], "^", "|" ), [@Serial Number] )
    )
VAR FinalTable =
    SELECTCOLUMNS (
        SplitString,
        "@Final String",
            VAR CurrentValue = [@Extracted Child]
            VAR Result =
                IF (
                    LEFT ( CurrentValue, 1 ) = "/",
                    MID ( CurrentValue, 2, LEN ( CurrentValue ) ),
                    CurrentValue
                )
            RETURN
                Result
    )
VAR Result =
    ADDCOLUMNS (
        SUMMARIZE ( FinalTable, [@Final String] ),
        "ValueCount",
            VAR CurrentValue = [@Final String]
            VAR Result =
                COUNTROWS ( FILTER ( FinalTable, [@Final String] = CurrentValue ) )
            RETURN
                Result
    )
RETURN
    Result

Look at the number of times I have used Final out of lack of creativity, reminds me of using Excel naming convention lazy people like me/I use “copy of copy of Project Final v3 (2).xlsx”

3 Likes

@AntrikshSharma,

Hmmm…I thought you were just kidding about ABBB, but I see it is now missing. Trying to figure out what happened to it, since it seems to disappear in the Split Column by Delimiters step…

As always, that DAX is brilliant. I’ve never seen that nested SUBSTITUTE construct before, and I’m still working through how you extracted the dates. That one’s definitely getting it’s own Analyst Hub entry.

How long did it take you to write that beast?

  • Brian
1 Like

I have known the logic of splitting values and transposing them for a while now.

I was around the solution in 5 - 10 minutes but then I checked both your and Melissa’s solution and I thought “ahh never mind I am never going to solve it”, then I imported only 1 row and thought of counting everything manually and regained some confidence. I almost spent an hour easily on this. This DAX is just like that horse :stuck_out_tongue:

PATHITEM is the strongest leg, SUBSTITUTE is the weakest one!! haha

2 Likes

@Melissa,

This is now driving me crazy. In the Split Column by Delimiter step, Row 3 is the only one that doesn’t parse properly to the end, dropping a blank in column Text.21 instead of ABBB:

image

For the life of me, I can’t figure out why this is happening, since all the other rows seem to parse fine, and this row parses fine right up until that last value. Any thoughts on what’s going on here?

Thanks.

  • Brian

Discovering all sorts of gems here. Really like the nested ADDCOLUMNS that allows you to call a virtual column within the same table you created it. I’ve been doing this via sequential variables, but this is much cleaner.

One question - any objection to using the “@” prefix to denote a virtual column to distinguish it from a measure, as advocated by Ferrari? We’re trying to standardize on that approach in the forum solutions involving virtual tables.

  • Brian
1 Like

@BrianJ Try this in PQ, logic is similar to SWITCH.

= Table.SplitColumn(#“Added Conditional Column”, “TEXT”, Splitter.SplitTextByAnyDelimiter({"; “, “/”,”;", " “,”$"}))

No objections, I was exhausted and didn’t pay attention to any of the best practices :sleeping:. I still have to rename the variables and think hard if I can come up with something better.

I simply love the “@” for virtual columns.

Seems I’ve missed some good conversation going on right here while I was away…

I can report that @AntrikshSharma nailed it :+1:
We both missed the delimiter "; “ in the list argument of Splitter.SplitTextByAnyDelimiter

@AntrikshSharma,

Nice catch - thanks!

@ibesmond - Great question, leading to an incredibly interesting and informative thread.

  • Brian

@Melissa and @AntrikshSharma,

Power Query is not out of the woods yet. I applied @AntrikshSharma’s fix, which correctly now picks up one ABBB, but one AAAA is still not getting picked up relative to DAXosaurus Rex’s results (which are correct). The highlighted row should total 11:

image

I’ve isolated the problem to Row 2 at the Split by Any Delimiter step:

image

But again am stuck on why this is happening, and it’s not picking up the second AAAA after the same delimiter as the first…

  • Brian

eDNA Forum- Count Delimited Text.pbix (22.8 KB)

@BrianJ for the without dates try this:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("lZBBDsMgDAS/gqwekdKmTXvIaf0NlP9/I2ZtQnMIUg4IvGIGm1LkKVmAlFTTmoDYjqPXCq2J5wARVdlykZfz7Sam24a5G7idmuiCSNQoh7MFtlzyZtifVwy6+O+BipB8QrLe+A1DyC4X7Jj2YSyqw9DzpecR9CEbTNI6+JFkte0=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, TEXT = _t, RESULT = _t]),
    #"Removed Columns" = Table.RemoveColumns(Source,{"RESULT"}),
    #"Replaced Value" = Table.ReplaceValue(#"Removed Columns"," ; ","^",Replacer.ReplaceText,{"TEXT"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","/","^",Replacer.ReplaceText,{"TEXT"}),
    #"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1","; ","^",Replacer.ReplaceText,{"TEXT"}),
    #"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2"," ","^",Replacer.ReplaceText,{"TEXT"}),
    #"Replaced Value4" = Table.ReplaceValue(#"Replaced Value3","$","^",Replacer.ReplaceText,{"TEXT"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value4", "TEXT", Splitter.SplitTextByDelimiter("^", QuoteStyle.Csv), {"TEXT.1", "TEXT.2", "TEXT.3", "TEXT.4", "TEXT.5", "TEXT.6", "TEXT.7", "TEXT.8", "TEXT.9", "TEXT.10"}),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter", {"ID"}, "Attribute", "Value"),
    #"Filtered Rows" = Table.SelectRows(#"Unpivoted Other Columns", each ([Value] <> "")),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Value"}),
    #"Grouped Rows" = Table.Group(#"Removed Other Columns", {"Value"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
    #"Grouped Rows" 

image

working on the with dates part.

@AntrikshSharma and @BrianJ,

I’ve got another solution and that’s using: Table.TransformColumns instead of Table.SplitColumn
I’ll post a full update later.

Table.TransformColumns(
    Source[[TEXT]],{{"TEXT", Splitter.SplitTextByAnyDelimiter({"; ", "/", ";", "$", " "}, QuoteStyle.Csv, false), 
    type text}}
)
2 Likes

@BrianJ Here is a solution for “with dates”:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("lZFdCoQwDISvUoqPQkzrLz4l15De/xo7TVp9WFZYqGjGzsdMe11ximMUCUE1nIEpZUoT70HEBJGzf/qsolVxXcTMqrGMV2QMcFf/NuDv7RH6m5U8Vd1pr+Zj0GfiY1067oE1RUFw0AgBjwOziU8UlZdELdWMKljJj0gN1nBzw90Ve0M6KDES7ttLV5iNsvygvJ+UF4RUCxpnNc7Q3DesU3Cta+3B3y17ks0JPFHmXjmW8gE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, TEXT = _t, RESULT = _t]),
    #"Removed Columns" = Table.RemoveColumns(Source,{"RESULT"}),
    #"Replaced Value" = Table.ReplaceValue(#"Removed Columns"," ; ","^",Replacer.ReplaceText,{"TEXT"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","; ","^",Replacer.ReplaceText,{"TEXT"}),
    #"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1"," ","^",Replacer.ReplaceText,{"TEXT"}),
    #"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2","$","^",Replacer.ReplaceText,{"TEXT"}),
    #"Replaced Value4" = Table.ReplaceValue(#"Replaced Value3",";","^",Replacer.ReplaceText,{"TEXT"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value4", "TEXT", Splitter.SplitTextByDelimiter("^", QuoteStyle.Csv), {"TEXT.1", "TEXT.2", "TEXT.3", "TEXT.4", "TEXT.5", "TEXT.6", "TEXT.7", "TEXT.8", "TEXT.9", "TEXT.10", "TEXT.11", "TEXT.12"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"ID", Int64.Type}, {"TEXT.1", type text}, {"TEXT.2", type text}, {"TEXT.3", type text}, {"TEXT.4", type text}, {"TEXT.5", type text}, {"TEXT.6", type text}, {"TEXT.7", type text}, {"TEXT.8", type text}, {"TEXT.9", type text}, {"TEXT.10", type text}, {"TEXT.11", type text}, {"TEXT.12", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each if ( List.Count ( Text.Split([TEXT.1], "/") ) - 1 ) = 1 then Text.Split([TEXT.1], "/"){0} else [TEXT.1]),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each if ( List.Count ( Text.Split([TEXT.1], "/") ) - 1 ) = 1 then Text.Split([TEXT.1], "/"){1} else null),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom.2", each if [TEXT.5] = null then null else
if ( List.Count ( Text.Split([TEXT.5], "/") ) - 1 ) = 1 then Text.Split([TEXT.5], "/"){0} else [TEXT.5]),
    #"Added Custom3" = Table.AddColumn(#"Added Custom2", "Custom.3", each if [TEXT.5] = null then null else
if ( List.Count ( Text.Split([TEXT.5], "/") ) - 1 ) = 1 then Text.Split([TEXT.5], "/"){1} else null),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Custom3",{"TEXT.1", "TEXT.5"}),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Removed Columns1", {"ID"}, "Attribute", "Value"),
    #"Filtered Rows" = Table.SelectRows(#"Unpivoted Other Columns", each ([Value] <> "")),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Value"}),
    #"Grouped Rows" = Table.Group(#"Removed Other Columns", {"Value"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
    #"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Count", Order.Descending}}),
    #"Added Custom4" = Table.AddColumn(#"Sorted Rows", "Custom", each if Text.StartsWith([Value], "/") then Text.Middle([Value], 1, Text.Length([Value])) else [Value]),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom4",{"Value"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns2",{"Custom", "Count"})
in
    #"Reordered Columns"

image

1 Like

@AntrikshSharma and @Melissa,

Thanks for the alternative solutions, but I’m also trying to understand why splitbyanydelimiter fails in this particular scenario? The Microsoft documentation is less than no help. If only my preorder copy of The Definitive Guide to M had arrived yet…

  • Brian

@BrianJ I have been silently watching this post and learning the crazy crazy stuff unleashed by you, @AntrikshSharma & @Melissa. The one part that grabbed my attention is

Is there any such book?

@MudassirAli,

Unfortunately not. At this point it’s just a running joke that we’re going to start a KickStarter campaign to bribe @Melissa to write it.

  • Brian
1 Like

@BrianJ I actually searched for it in excitement. If @Melissa writes it, it’s going to be a very expensive guide for sure.

LOL

@BrianJ the issue is with Table.SplitColumn and not with splitbyanydelimiter, that’s why I changed to Table.TransformColumns. My solution post above is now updated.

You’ll want to check out this video that illustrates the issue.

@BrianJ Looks like there is some kind of column limitation, check this, I imported only 1 row and columns # goes till 15 but yours stopped at 14 with full data:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJ0VLBWcHICEo6O+lAahMFMCN/J0QkkAhF3dARrcnJSio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, TEXT = _t, RESULT = _t]),
    #"Removed Columns" = Table.RemoveColumns(Source,{"RESULT"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Removed Columns", "TEXT", Splitter.SplitTextByAnyDelimiter({"; ", " ", "/", "$"}))
in
    #"Split Column by Delimiter" 

your solution with 14 columns: