Transform PDF Journal Entry to an excel file in Powerquery

Hello Forum. I receive the attached accounting journal Entry in PDF every month and i would like to transform it into excel. I have tried to clean it many times but it is extremely messy.
I have attached the desired result of the voucher in excel for reference. Thanks in advance
PDF voucher.pdf (56.4 KB)
desired result.xlsx (9.6 KB)

Hi @jazzista1967

I’m not sure what accounting software the company is using. I’m not sure if you have asked the question, would it be possible to get the information in excel format instead of pdf format?

I had a similar situation on one of jobs and they were doing the same thing. I just asked for the raw data.

I’m sure the accounting systems will have that option built in.

Its just a thought instead of going down that road with pdf files.

Thanks
Keith

1 Like

Hi @jazzista1967,

I found this video within the EDNA learning platform that might help you. @Melissa created this video. Please take a look at it.

Extract Data From PDF’s

Events | EDNA

Maybe this will help.

Keith

1 Like

Hi Keith Good evening .See the excel file. This was actually coming from LOTUS. So, you can imagine how old this software is. The excel version is slightly different from the PDF though. I have no control on the output. It comes as is from the accounting software. The file is extremely messy as you can see

voucher.xlsx (43.5 KB)

Hi @jazzista1967
I was just trying to give an option that you might have not thought of. I had no control of the output either in my other positions but i sitll ask the question back to the accounting team to see if they can just give the raw data GL in the format that you need. Most Accounting software has that option

Please take a look at @Melissa video that i stated earlier that might be able to help.

You can also try using DataMentor which is part of the EDNA Learning platform.

There are several youtube videos that will be able to help too.

I hope this help to get what you need for your project.

Thanks
Keith

HI Keith. That was the raw data in an exce format . As you can see , its very messy. Thats why i did not provide that file to begin with. With the PDF , at least you have a decent view of the output.

Hi @jazzista1967,

Based on your XLSX you appear to be dealing with fixed column widths.
All rows have a length of 124 characters, this video may be helpful.

Hi @jazzista1967

welcome back to Forum

As Keith said Melissa made excellent materials and videos for that .

You can also check her Convert a PDF to a Tabular Format lesson
within Applied Problem Solving with Power Query/M course:

for more then 1 hour you can find her excellent step by step explanations with lots of practical details. Excellent lesson that is highly recommended to check.

Thanks. i have not been in the forum for over a year and past experience with this forum is that someone would give the answer and i would simply look at the answer and study the approach and apply it. Also, i have noticed that you need authorization to post a comment. I don’t know if it’s because of the type of membership i have.

HI Melissa. Thanks for your reply. I prefer working out of the PDF file because the excel file provides different columns. The problem that i encounter is that when working on transforming the PDF for some reasons, the values on the last pages are completely omitted, and the JE is out of balance, and it is really frustrated.

Sometimes it is not easy to do it manually, i think. Maybe using some third-party software will help.

@crypticnenad73 , Welcome to Forum.

I agree, but also thanks to Melissa and her easy way teaching M code and her great lectures I learn a lot and some/ lots of things make it possible.

Don’t lose hope and give it a try.

@Melissa - big thanks once again for all M (agic) that you bring.

Regarding the posting aspect. sometimes the posting need to be approved by the EDNA admin team. Sometime I get that that message whenever I’m replying to a message with links or screen prints.

@jazzista1967 got it, when I have a few minutes to spare, I will take a look.

@crypticnenad73 Welcome to the forum, its great to have you here and contributing. :+1:

@mspanic you’re too kind! And honestly, you’re doing a fantastic job yourself, sharing your knowledge and experiences in the broader community! :raised_hands:

1 Like

Haven’t forgotten about you @jazzista1967

Just been crazy busy, quick question is it still unsolved?
Maybe I can find some time this weekend…

1 Like

Hi Melissa. Thats fine. I have not been asked at work about this file yet. My idea is when i receive the file on my monthly end close I would simply would to create a function with the code. The tricky part are the last pages for some reasons.
Regards