Transformation power query

@Melissa
Would you please help.

This is example of what I have :
image

And I need to recreate but grouping is still my big problem.

duty example1.csv (1002 Bytes)

Many thanks,

Iwona

Hi @Iwona,

See if this meets your requirement.
Just copy this code to a new blank query.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("xVpNb9wgEP0vPkeEGcA21ypSLlUrpR9qFeXYcw/9/1KBwfaAnbIG1pWyy84mCxu/N2+Gh19fh6fhYQD3QCn9APCo4BElShd8AP/Wp8/u6emFfu1fftvi9efnrz/+T3+///T28Dp8jFN8dY/nH4j+tTQCJhpxCu/7YBIaQvD88sXHUkjJ1zv4WRbAbQEz+s/OQuk4zssCgAL4Aj4eb1jgu58o/hejkKWP/HsSpZR/JYxqnMaEaaSumoYxAC9nwEjI68gEAytAUqgODEBknwVYKBZB10LrLLYnSTALwD4kqOMSQ09djh5G9GL+mjV/3ZUO6dSav5ajp4RMUtZQ/rEY4IY1OHqTUCXAb0EPhDaN04RLKGTdt2Ek0HsSjHclgYF3SeBIje0kQD0kKWySFHag6yw+n8JBFdpJIPuQAOq+DSOBuZwEOpLARhLYhQTSEiCtSjBxEsRrxHQ7VvZNx0slda8ElQK80/G5Fb3xcvTCxR0JKDfiip670oFMrejBkKSw5LLgUhYzXcezrZirwqWvdSN6zbk3XY7eTKipiJ7m6CVXthY9laKnMvSMzOKSDO7Ra1ZOEuBi/S9NE6tw3TSMBPPlJAgYTTGFJ57CrulVHUgwchJg2t7BSFssHv+nKtxhNxVJUEdJRgK7J8Gd+3EdwZ9oVLwf17xi1rZiSRVGoRMlyPbsPj6r41Zg3SZ2R4IerRi0b6lBXk4CIPDDtXejyWyOHjwwSUuuaWTJLzMxKG5r9t1Yl3ruKkKdGDAAD1yxO0u5iQBGKTdcyk2HXTUm3RhmG6ppWcMHc9ni2qdw0Ui7DbrKKrxL4Tp7hjHgelcsyKy79opGUCsgmHZftQxIOjqVuWJz6rz5+DwJVN026E46Dq2bMjgw1+4sA5bAD1i4UW/WKPaRAZOSwGQkQDNsMqDO9vSW2NvKgE4yIMvGbpEBB87anWXARAZEOdBbO6fIcGwt4/OQlHHDjRqwVINpQdeY3bJA1sv18VahtSUkDcBmDTiw1e7MgFgAdGSC2VJSdTHY0+MRTeYVZ8C2oCVBOMOAudYM27nrrUcskQGtWzq43poLAFkCSCYpqcgm62rNmcU4WBmgsliXknFPAt2q4J4E2F5PIgmaC8H1Dp8m8KkVsNSOrZioRRY6nrIYUstlTnT1U2bx+YagCw9AqD48MM08uN7kCytKatf9AXliv+g+p22Jz2fSiuBw3/YiLsCzpywzmcXtJGi+8yJ2ha1OL1xv8o0R/EgGzc1dB1gSdzF7x/QuCYe7MsNeHM7wwLa39X13B63HPnjg811gEy2OkNsSpvdGjF1OXVO/d059PgTy9Vhc7NQP7r/oc/TeXFui6V8hK29/AQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t, Column5 = _t, Column6 = _t, Column7 = _t, Column8 = _t, Column9 = _t, Column10 = _t, Column11 = _t, Column12 = _t, Column13 = _t, Column14 = _t, Column15 = _t, Column16 = _t, Column17 = _t, Column18 = _t, Column19 = _t, Column20 = _t]),
    AddedGroup = Table.FillDown( Table.ReplaceValue( Table.AddIndexColumn(Source, "Group", 1, 1), each [Group], each if [Column1] = "D" then [Index] else null,Replacer.ReplaceValue,{"Group"}), {"Group"}),
    GroupRows = Table.Group(AddedGroup, {"Group"}, {{"Grouped", each Table.RemoveColumns(_, {"Group"} ), type table [Column2=nullable text, Column3=nullable text, Column4=nullable text, Column5=nullable text, Column6=nullable text, Column7=nullable text, Column8=nullable text, Column9=nullable text, Column10=nullable text, Column11=nullable text, Column12=nullable text, Column13=nullable text, Column14=nullable text, Column15=nullable text, Column16=nullable text, Column17=nullable text, Column18=nullable text, Column19=nullable text, Column20=nullable text]}}),
    GetLevels = Table.AddColumn(GroupRows, "AllRows", each Table.Group( [Grouped], {"Column1"}, {{"AllRows", each Table.RemoveColumns( _, "Column1"), type table [Column1=nullable text, Column2=nullable text, Column3=nullable text, Column4=nullable text, Column5=nullable text, Column6=nullable text, Column7=nullable text, Column8=nullable text, Column9=nullable text, Column10=nullable text, Column11=nullable text, Column12=nullable text, Column13=nullable text, Column14=nullable text, Column15=nullable text, Column16=nullable text, Column17=nullable text, Column18=nullable text, Column19=nullable text, Column20=nullable text]}})),
    Result = Table.AddColumn( GetLevels, "nTable", each
        let 
            AddIndex = Table.AddIndexColumn([AllRows], "Index", 0, 1, Int64.Type),
            AddColCount = Table.AddColumn(AddIndex, "ColCount", each if [Column1] = "D" then Table.ColumnCount( [AllRows] ) else List.Count(List.RemoveMatchingItems( Record.ToList( [AllRows]{0}? ), {""} ) ), type number),
            lColCount = List.Buffer(AddColCount[ColCount]),
            UpdatedTables = Table.AddColumn(AddColCount, "Temp", each
                let 
                    StartNum = if [Column1] = "D" then 1 else List.Sum( List.FirstN( lColCount, [Index] )),
                    OldNames = List.Transform( List.Numbers( 2, [ColCount] ), each "Column" & Text.From(_) ),
                    NewNames = List.Transform( List.Numbers( if [Column1] = "D" then 1 else StartNum +1, [ColCount] ), each "Column" & Text.From(_)),
                    nTable = Table.RenameColumns( Table.SelectColumns( [AllRows], OldNames), List.Zip( {OldNames, NewNames } ))
                in 
                    nTable
            )[[Index], [Temp]],
            Pivot = Table.Pivot(Table.TransformColumnTypes(UpdatedTables, {{"Index", type text}}), List.Distinct(Table.TransformColumnTypes(UpdatedTables, {{"Index", type text}})[Index]), "Index", "Temp"),
            Step1 = Table.ExpandTableColumn(Pivot, Text.From(0), Table.ColumnNames( Record.Field( Pivot{0}, Text.From(0) )), Table.ColumnNames( Record.Field( Pivot{0}, Text.From(0) ))),
            Step2 = Table.ExpandTableColumn(Step1, Text.From(1), Table.ColumnNames( Record.Field( Pivot{0}, Text.From(1) )), Table.ColumnNames( Record.Field( Pivot{0}, Text.From(1) )))
        in
            Step2
    )[[nTable]],
    ExpandnTable = Table.ExpandTableColumn(Result, "nTable", List.Distinct( List.Combine( List.Transform( Result[nTable], each Table.ColumnNames( _ )))), List.Distinct( List.Combine( List.Transform( Result[nTable], each Table.ColumnNames( _ )))) ),
    ExtractValues = Table.TransformColumns(ExpandnTable,{{"2", each Text.Combine( List.Combine( List.Zip( Table.ToColumns(_) )), "|"), type text}}),
    SplitCol = 
        let
            NumOfFields = List.Max( List.Transform( ExpandnTable[#"2"], each Table.RowCount(_) * Table.ColumnCount( _ ) )),
            AddColNames = List.Transform( List.Numbers( List.Max( List.Transform( Table.ColumnNames( Table.RemoveLastN( ExpandnTable, 1 )), each Number.From( Text.Select( _, {"0".."9"}))) ) +1, NumOfFields, 1 ), each "Column" & Text.From( _ ))
        in
            Table.SplitColumn(ExtractValues, "2", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), AddColNames)
in
    SplitCol

I hope this is helpful

1 Like

@Melissa
I am getting something like this. I am afraid I don’t know how to do it from scratch. Could I kindly please about guidance.

Just open the Advanced Editor window, select everything you see there and paste the full code from my previous response back in. That should do it.

I hope this is helpful

@Melissa
I have real data on my desktop in .doc format. This file is more than 1000 pages and I need to apply it into your code. How can I achieve that ?
And the result is not correct .
I need :
image

and what I can see is:

Thank you,

Iwona

Oh, everything in a single line… got it. Actually makes the code easier :+1:
See how you get on with this.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("xVpNb9wgEP0vPkeEGcA21ypSLlUrpR9qFeXYcw/9/1KBwfaAnbIG1pWyy84mCxu/N2+Gh19fh6fhYQD3QCn9APCo4BElShd8AP/Wp8/u6emFfu1fftvi9efnrz/+T3+///T28Dp8jFN8dY/nH4j+tTQCJhpxCu/7YBIaQvD88sXHUkjJ1zv4WRbAbQEz+s/OQuk4zssCgAL4Aj4eb1jgu58o/hejkKWP/HsSpZR/JYxqnMaEaaSumoYxAC9nwEjI68gEAytAUqgODEBknwVYKBZB10LrLLYnSTALwD4kqOMSQ09djh5G9GL+mjV/3ZUO6dSav5ajp4RMUtZQ/rEY4IY1OHqTUCXAb0EPhDaN04RLKGTdt2Ek0HsSjHclgYF3SeBIje0kQD0kKWySFHag6yw+n8JBFdpJIPuQAOq+DSOBuZwEOpLARhLYhQTSEiCtSjBxEsRrxHQ7VvZNx0slda8ElQK80/G5Fb3xcvTCxR0JKDfiip670oFMrejBkKSw5LLgUhYzXcezrZirwqWvdSN6zbk3XY7eTKipiJ7m6CVXthY9laKnMvSMzOKSDO7Ra1ZOEuBi/S9NE6tw3TSMBPPlJAgYTTGFJ57CrulVHUgwchJg2t7BSFssHv+nKtxhNxVJUEdJRgK7J8Gd+3EdwZ9oVLwf17xi1rZiSRVGoRMlyPbsPj6r41Zg3SZ2R4IerRi0b6lBXk4CIPDDtXejyWyOHjwwSUuuaWTJLzMxKG5r9t1Yl3ruKkKdGDAAD1yxO0u5iQBGKTdcyk2HXTUm3RhmG6ppWcMHc9ni2qdw0Ui7DbrKKrxL4Tp7hjHgelcsyKy79opGUCsgmHZftQxIOjqVuWJz6rz5+DwJVN026E46Dq2bMjgw1+4sA5bAD1i4UW/WKPaRAZOSwGQkQDNsMqDO9vSW2NvKgE4yIMvGbpEBB87anWXARAZEOdBbO6fIcGwt4/OQlHHDjRqwVINpQdeY3bJA1sv18VahtSUkDcBmDTiw1e7MgFgAdGSC2VJSdTHY0+MRTeYVZ8C2oCVBOMOAudYM27nrrUcskQGtWzq43poLAFkCSCYpqcgm62rNmcU4WBmgsliXknFPAt2q4J4E2F5PIgmaC8H1Dp8m8KkVsNSOrZioRRY6nrIYUstlTnT1U2bx+YagCw9AqD48MM08uN7kCytKatf9AXliv+g+p22Jz2fSiuBw3/YiLsCzpywzmcXtJGi+8yJ2ha1OL1xv8o0R/EgGzc1dB1gSdzF7x/QuCYe7MsNeHM7wwLa39X13B63HPnjg811gEy2OkNsSpvdGjF1OXVO/d059PgTy9Vhc7NQP7r/oc/TeXFui6V8hK29/AQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t, Column5 = _t, Column6 = _t, Column7 = _t, Column8 = _t, Column9 = _t, Column10 = _t, Column11 = _t, Column12 = _t, Column13 = _t, Column14 = _t, Column15 = _t, Column16 = _t, Column17 = _t, Column18 = _t, Column19 = _t, Column20 = _t]),
    AddGrouping = Table.FillDown( Table.ReplaceValue( Table.AddIndexColumn(Source, "Group", 1, 1), each [Group], each if [Column1] = "D" then [Index] else null,Replacer.ReplaceValue,{"Group"}), {"Group"}),
    FirstGroup = Table.SelectRows(AddGrouping, each ([Column1] = "D")),
    SecondGroup = Table.SelectRows(AddGrouping, each ([Column1] <> "D")),
    ReplaceBlanks = Table.ReplaceValue(SecondGroup,"",null,Replacer.ReplaceValue, Table.ColumnNames( SecondGroup )),
    NewTable = Table.Combine( { FirstGroup, ReplaceBlanks }),
    Unpivot = Table.UnpivotOtherColumns(NewTable, {"Group"}, "Attribute", "Value"),
    DelCol = Table.RemoveColumns(Unpivot,{"Attribute"}),
    Result = Table.Combine( Table.Group(DelCol, {"Group"}, {{"Count", each Table.Skip( Table.Transpose( _ ), 1 ), type table [Group=number, Value=text]}})[Count] )
in
    Result

.

As for connecting to a Word file :thinking:
They haven’t thought of that yet at MSFT… so no Standard Connector available (at this time) BUT this method works (see article below) but READ the instructions carefully!!

I’ve performed these steps for you based on the sample docx you’d provided earlier.
So all you’d have to do now, is add the full File Path string to your actual XML-file to the FileLocation parameter list and select that.

The SourceFile and Solution query will update automatically.

Here are your sample files:
document.xml (85.5 KB)
Duty-example.PBIX (14.2 KB)

I hope this is helpful.

2 Likes

Love how you provided solution on this thread @Melissa. Appreciate your help on it.

We hope this helped you @Iwona:slight_smile:

If not, how far did you get and what kind of help you need further?

If yes, kindly mark as solution the answer that solved your query.

@Melissa , please reccomend…

I have data what I can open in Notepad or Word, should I try to “push it” into Excel and after into PBI or follow this article from your previous post. What is the appropriate steps in these odd sources od data.

Iwona

Hi @Iwona,

Yes if you can push your data to a CSV, TXT or XLSX for example, I’d go for that because it will allow you to use one of the standard connectors.

I hope this is helpful

Hi @Melissa

Could I please for example of this “transition” from Word into excel based on my sample. I need to learn the correct way and CSV when standard features are used .

Iwona

Hi @Iwona,

Not sure if there’s a question there…
If there is, I don’t understand, can you please clarify?

Apologies @Melissa for not being clear with my concern.

I am just wondering , how efficiently I can get this data from .doc file into CSV file ?
I hope this makes more sense now.

Iwona

Hi @Iwona,

I think the most straight forward approach would be not to convert to CSV but to TXT, that’s just one step - you can achieve that as follows:

  1. With the Word file open go to “Save as”
  2. Select “Plain Text” from the options list, this will save your file as a TXT file. Note that you will receive a warning stating that this will cause formatting, pictures and objects in your file to be lost. That’s no problem this is just a copy :wink:

For connecting to a TXT file you can use a standard Connector, see the Source step below.

let
    MaxCols = List.Max( Table.TransformColumns( GroupBy,{{"Column", each List.Count( Splitter.SplitTextByAnyDelimiter({","} , QuoteStyle.Csv, false)(_)), type text}})[Column]),
    Source = Table.FromColumns({Lines.FromBinary(File.Contents(FileLocation), null, null, 1252)}),
    CreateGroupKey = Table.FillDown( Table.ReplaceValue(Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type), each [Index], each if List.Contains( {"D"}, Text.Start( [Column1], 1 )) then [Index] else null,Replacer.ReplaceValue,{"Index"}), {"Index"}),
    RemoveFirstChar = Table.TransformColumns(CreateGroupKey, {{"Column1", each Text.AfterDelimiter(_, ","), type text}}),
    GroupBy = Table.Group(RemoveFirstChar, {"Index"}, {{"Column", each Text.Combine( _[Column1], "," )}})[[Column]],
    ExpandToCols = Table.SplitColumn(GroupBy, "Column", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), List.Transform( {1..MaxCols}, each "Column" & Text.From(_))) 
in
    ExpandToCols 

Here are your sample files:
Duty1-example.txt (3.9 KB)
Duty-example.PBIX (13.0 KB)

I hope this is helpful

Big thank you @Melissa for your patience, time, knowledge and comprehensive explanations.

Iwona

1 Like

Glad I could help
All the best