Clean Tax data from a TXT file in power query

Hello EDNA Forum. I have a TXT file that contains tax data that is a complete mess. This is a summary of a much larger data set containing almost 500 records.
The challenge is unwinding the data, so it shows for each vendor a row with several columns. The result that i am after is below. I really don’t know how to unwind this data specially bringing the columns in the correct format. Any help is greatly appreciated. Thanks in advance and merry Christmas
output result in excel.xlsx (17.1 KB)

report to clean.txt (2.4 KB)

Vendor Name Address 1 Address 2 Address 3 Address 4 FED ID AMT Type Status 1099- AMT
DES100 Juan W. Doe1 480 Oakwood Circle Anywhere1 TX 75069 455-98-9245 Rents No-1099-flag overridden $105,141.96
GOO100 William Harrison Garcia 11 Plaza Suite 2010 Anywhere2 OK 97046 MISSING FID Non-Empl. No-1099-flag overridden $154,070.97
LOV100 David Lewis 41 Northampton Pl Anywhere3 FL 27098 297-82-2172 GrossAttrn $154,234.87

Hello Luis,

is this worse than the Accounts Payable cleanup?

Quite some efforts to achieve the desired result:

Thought that it was a Split Column by Position problem, but it is only for the very first column. But at least there it is very helpful:

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\yourPath\report to clean.txt"), null, null, 1252)}),
    #"Removed Top Rows" = Table.Skip(Source,10),
    #"Removed Bottom Rows" = Table.RemoveLastN(#"Removed Top Rows",3),
    #"Filtered Rows" = Table.SelectRows(#"Removed Bottom Rows", each ([Column1] <> "" and not Text.Contains([Column1],"------"))),
    #"Split Column by Position" = Table.SplitColumn(#"Filtered Rows", "Column1", Splitter.SplitTextByPositions({0, 7}, false), {"Column1.1", "Column1.2"}),
    #"Split Column by Position1" = Table.SplitColumn(#"Split Column by Position", "Column1.2", Splitter.SplitTextByPositions({0, 43}, false), {"Column1.2.1", "Column1.2.2"}),
    #"Replaced Value" = Table.ReplaceValue(#"Split Column by Position1","1099-Amt.       FederalW/H      State W/H","$1099-Amt.",Replacer.ReplaceText,{"Column1.2.2"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value", "Column1.2.2", Splitter.SplitTextByDelimiter("$", QuoteStyle.Csv), {"Column1.2.2.1", "Column1.2.2.2"}),
    #"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter",{{"Column1.1", Text.Trim, type text}, {"Column1.2.1", Text.Trim, type text}, {"Column1.2.2.1", Text.Trim, type text}, {"Column1.2.2.2", Text.Trim, type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Trimmed Text", [PromoteAllScalars=true]),
    #"Replaced Value1" = Table.ReplaceValue(#"Promoted Headers","",null,Replacer.ReplaceValue,{"Vendor"}),
    #"Filled Down" = Table.FillDown(#"Replaced Value1",{"Vendor", "1099-Amt."}),
    #"Grouped Rows" = Table.Group(#"Filled Down", {"Vendor", "1099-Amt."}, {{"Name", each [#"Name / Address"]{0}}, {"Address 1", each Text.Combine(List.Skip(List.RemoveLastN([#"Name / Address"],1)), " ")}, {"Address 2", each List.Last([#"Name / Address"])}, {"Federal ID", each [Federal ID    Amt.Type]{0}}, {"Status", each [Federal ID    Amt.Type]{1}}}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Grouped Rows", "Address 2", Splitter.SplitTextByEachDelimiter({","}, QuoteStyle.Csv, true), {"Address 2", "Address 3"}),
    #"Split Column by Delimiter2" = Table.SplitColumn(#"Split Column by Delimiter1", "Address 3", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Address 3", "Address 4"}),
    #"Split Column by Delimiter3" = Table.SplitColumn(#"Split Column by Delimiter2", "Federal ID", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, true), {"Federal ID", "Amt.Type"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Split Column by Delimiter3",{"Vendor", "Name", "Address 1", "Address 2", "Address 3", "Address 4", "Federal ID", "Amt.Type", "Status", "1099-Amt."}),
    #"Trimmed Text1" = Table.TransformColumns(#"Reordered Columns",{{"Vendor", Text.Trim, type text}, {"Name", Text.Trim, type text}, {"Address 1", Text.Trim, type text}, {"Address 2", Text.Trim, type text}, {"Address 3", Text.Trim, type text}, {"Address 4", Text.Trim, type text}, {"Federal ID", Text.Trim, type text}, {"Amt.Type", Text.Trim, type text}, {"Status", Text.Trim, type text}, {"1099-Amt.", Text.Trim, type text}}),
    #"Changed Type with Locale" = Table.TransformColumnTypes(#"Trimmed Text1", {{"1099-Amt.", Currency.Type}}, "en-US")
in
    #"Changed Type with Locale"

Regards,
Matthias

2 Likes

HI Matthias… How are you and Merry Christmas… Oh my god… It worked like a charmmm in the entire data set !!! Thank you very much for your help… I have spent at least 3 days on trying to make the report work… I was able to get and clean just a handful of columns ( The amount column and the ID column: literally the first and the last columns of the TXT)… Quick questions:

  1. Did you create the solution using the GUI or did you create your own functions?
  2. How long did it take you to solve this?
  3. How can i learn the steps you made or at least the reasoning. How did you approach the problem ?

And yes… it is worst that the Accounts payable report you helped solved once… the data comes from the same accounting software (OGSYS) .

Again, thanks a lot for solving this.

Ps–> the Ap report you helped with a while back for some reason it works sometimes and sometimes if does not. I wonder if you could you take a look at the txt report.

Merry Christmas, Luis!

It is good to hear that it worked on the complete data set*. :+1:

The first and the last column are important!

  1. If you look at the step names, you see that ALL steps are UI driven. But I made major adjustments in the grouping step plus name changing in some of the splitting steps.
  2. Longer then I expected (which is normal), but not too long.
  3. The blanks in the first dotted line misled me into guessing that some columns were defined by character length. That’s how I started off. #“Split Column by Position1” already wasn’t really a perfect solution but it was good enough. Getting the last column could have been done in various ways, and I thought using the $ would be the safest and most convenient. So far that’s all “manual work”, but then I had to think, cause there were no ready made solutions in my repertoire. Thought about if pivoting was viable (not really), and that index with custom column extracting by position could be used. => Easier than that would be to tweak grouping to get the first and second resp. first, last and the middle positions. It worked out fine!

It remeinded me of that AP task, so it makes sense that it is from OGSYS too. :blush:

*There is always the possibility that there are cases, which were not included in the sample data. That could be the case also regarding the AP task. I remember that you did not select my solution. => There is a chance that you did not apply my solution. In that case you can try mine and perhaps you are lucky:

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\yourPath\ap340.txt"), null, null, 1252)}),
    #"Filtered PageHeaders" = Table.SelectRows(Source, each ([Column1] <> "" 
        and not Text.Contains([Column1], "Printed on:")
        and not Text.Contains([Column1], "A/P Detail List")
        and not Text.Contains([Column1], "Date Filter:")
        and not Text.Contains([Column1], "UnPaid(FSP)")
        and not Text.Contains([Column1], "------")
        and not Text.Contains([Column1], "Code      Description            Number") )),
    #"Added Invoice" = Table.AddColumn(#"Filtered PageHeaders", "Invoice", each if Text.Contains([Column1], "$") and Text.Contains([Column1], "/") then Text.End([Column1], 101) else null),
    #"Inserted First Characters" = Table.AddColumn(#"Added Invoice", "Details", each if [Invoice]=null then [Column1] else Text.Start([Column1], Text.Length([Column1])-101), type text),
    #"Shifted Invoice" = Table.FromColumns(
        Table.ToColumns(#"Inserted First Characters") & {{null} & List.RemoveLastN(#"Inserted First Characters"[Invoice],1)},
        Table.ColumnNames(#"Inserted First Characters") & {"Invoice.1"} ),
    #"Trimmed Details" = Table.TransformColumns(#"Shifted Invoice",{{"Details", Text.Trim, type text}}),
    #"Added Custom" = Table.AddColumn(#"Trimmed Details", "Custom", each if [Details]<> "" and [Invoice]<>null then [Invoice] else [Invoice.1], type text),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"Invoice", "Invoice.1"}),
    #"Added Vendor Code" = Table.AddColumn(#"Removed Columns1", "Vendor Code", each if Text.StartsWith([Details], "Vendor") and Text.Contains([Details], "Total:") then Text.BetweenDelimiters([Details], " ", " ") else null),
    #"Filled Up" = Table.FillUp(#"Added Vendor Code",{"Vendor Code"}),
    #"Added Vendor" = Table.AddColumn(#"Filled Up", "Vendor", each if Text.StartsWith([Details], [Vendor Code] & " ") then Text.AfterDelimiter([Details], [Vendor Code]& " ") else null),
    #"Filled Down" = Table.FillDown(#"Added Vendor",{"Vendor"}),
    #"Filtered Invoice Lines" = Table.SelectRows(#"Filled Down", each ([Custom] <> null and [Details]<>"" and not Text.Contains([Details], "Total:"))),
    #"Split Column by Positions" = Table.SplitColumn(#"Filtered Invoice Lines", "Custom", Splitter.SplitTextByPositions({0, 13, 23, 30, 39, 48, 64, 75, 84, 100, 102}), {"Invoice Number",	"Invoice Date",	"Voucher Number",	"Payment Due Date",	"Transaction Date",	"Original Amount",	"Discount", "byDate",	"Balance Now due",	"Pay"}),
    #"Trimmed Text" = Table.TransformColumns(#"Split Column by Positions",{{"Invoice Number", Text.Trim, type text}, {"Invoice Date", Text.Trim, type text}, {"Voucher Number", Text.Trim, type text}, {"Payment Due Date", Text.Trim, type text}, {"Transaction Date", Text.Trim, type text}, {"Original Amount", Text.Trim, type text}, {"Discount", Text.Trim, type text}, {"byDate", Text.Trim, type text}, {"Balance Now due", Text.Trim, type text}, {"Pay", Text.Trim, type text}, {"Vendor", Text.Trim, type text}}),
    #"Changed Type with Locale" = Table.TransformColumnTypes(#"Trimmed Text", {{"Original Amount", Currency.Type}, {"Discount", Currency.Type}, {"Balance Now due", Currency.Type}}, "en-US")
in
    #"Changed Type with Locale"

Regards,
Matthias