Latest Enterprise DNA Initiatives

Import from PDF where the PDF layout changes


I know how to import tables from a PDF.

My issue is that some of the pdf files I want to import from may or may not have say Sales data Table and if they do it may not have the same table number or be on a the same page as in the other files.

Can one have Power Query search for a table with a particular heading irrespective as to where it is in the pdf and then import it.

In the attached files I have imported the Sales Table from Report1.pdf but in the second query I manually had to determine which table it was on this occurrence of the Report2.pdf. (It may be that next time Report2 may have a new table inserted above the first table in the report and so the table name will change).

Once I get both tables I then want to merge them.

How can I get PQ to find and import the table I want from each file in a dynamic manner?

Kind Regards


Report1.pdf (16.6 KB)
Report2.pdf (19.0 KB)
Summary.xlsx (22.2 KB)

Hi @AllisterB,

In the new learning portal use the Search option (top right) and enter my name: Melissa
Look for this course: Applied Problem Solving with Power Query/M
and review this chapter: Convert a PDF to a Tabular Format

I’m confident this will help you work through your scenario.

All the best.


Hi Melissa

Thank you for your recommendation.

Are the files used in the course available for me to learn with? If so - where are they?

Kind Regards


There should be a resource pack in the course, please check that - sorry but it is a bit difficult, on mobile…

Hi Melissa

I cannot see the resource pack on the front screen of the course.

I have been working through the course to solve my initial posting. However, I have a page that has a table that only has one column so I get an error when I run the code that moves Column 2 (which doesn’t exist) to Column 1. I have tried using try to solve this.

How would teh cose iin your example change if one of the tables had only one column and it was in Column 1



    Source = Pdf.Tables(File.Contents(FileLocation), [Implementation="1.3"]),
    #"Filtered Rows" = Table.RemoveLastN(Table.SelectRows(Source, each ([Kind] = "Page")), 1 ),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows", "ColCount", each Table.ColumnCount([Data])),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Data2", each let
            RenameCols = Table.RenameColumns([Data],{{"Column1", "Temp1"}, {"Column2", "Temp2"}}),
            UpdateCol1 = try Table.AddColumn(RenameCols, "Column1", each if [Temp1] = null then [Temp2] else [Temp1]),
            UpdateCol2 = Table.AddColumn(UpdateCol1, "Column2", each if [Temp2] = [Column1] then null else [Temp2]),
            DelCols = Table.RemoveColumns(UpdateCol2,{"Temp1", "Temp2"})
    #"Added Custom1"

HI @AllisterB,

I will inquire with the eDNA team for the course resource pack. Thanks for reporting back :+1:

As for tables with 2 and sometimes 1 column, there is a set of table functions in M that allow for a MissingField.Type parameter, select one of those. These are the options:

Name Value Description
MissingField.Error 0 Indicates that missing fields should result in an error. (This is the default value.)
MissingField.Ignore 1 Indicates that missing fields should be ignored.
MissingField.UseNull 2 Indicates that missing fields should be included as null values.

If that isn’t possible, are you able to generate a list with column names that you should get (at most)?

1 Like

Hi @AllisterB you can now download the resources of the Convert a PDF to a Tabular Format video mentioned by @Melissa, along with all other resources of the course (if needed).


1 Like

This course was so very helpful. once you work it through a couple of times you get to feel as though you are learning techniques that will be useful in solving other issues - thank you Melissa

1 Like


When I l click on the Download button I get the attached.
Download result.docx (126.9 KB)

Appears to be an issue wi ith Firefox as it works with Chrome.