I have a PA Desktop flow that reads AWS invoice PDF which extracts a few key fields (Account Number, Invoice Number, Original Invoice Number, PO Number, Total Amount, Invoice Date, Billing Period). It outputs the scanned result to a CSV which feeds to a Power BI Report. For the most part it’s all good but I struggle anything with dates such as Invoice Date and Billing Period - there is always that additional character(s) from the next line. I tried regular expression, Trim - just can’t get it clean perfectly in PAD. For now, I use Power BI Transform and replace the entries manually or modify the M query but that doesn’t resolve the issue. I’m wondering perhaps the only way to clean this perfectly is to use scripts. Initially I was using cloud flows including using AI Builder (could never get pass 49%). Attached see attached pdf for the screenshots, all numbers have been masked out but with actual character length. Any suggestion is greatly appreciated!
PAD_PDF-Invoice-Flow-screenshots.pdf (343.9 KB)