My issue is that some of the pdf files I want to import from may or may not have say Sales data Table and if they do it may not have the same table number or be on a the same page as in the other files.
Can one have Power Query search for a table with a particular heading irrespective as to where it is in the pdf and then import it.
In the attached files I have imported the Sales Table from Report1.pdf but in the second query I manually had to determine which table it was on this occurrence of the Report2.pdf. (It may be that next time Report2 may have a new table inserted above the first table in the report and so the table name will change).
Once I get both tables I then want to merge them.
How can I get PQ to find and import the table I want from each file in a dynamic manner?
In the new learning portal use the Search option (top right) and enter my name: Melissa
Look for this course: Applied Problem Solving with Power Query/M
and review this chapter: Convert a PDF to a Tabular Format
I’m confident this will help you work through your scenario.
I cannot see the resource pack on the front screen of the course.
I have been working through the course to solve my initial posting. However, I have a page that has a table that only has one column so I get an error when I run the code that moves Column 2 (which doesn’t exist) to Column 1. I have tried using try to solve this.
How would teh cose iin your example change if one of the tables had only one column and it was in Column 1
Regards
Allister
let
Source = Pdf.Tables(File.Contents(FileLocation), [Implementation="1.3"]),
#"Filtered Rows" = Table.RemoveLastN(Table.SelectRows(Source, each ([Kind] = "Page")), 1 ),
#"Added Custom" = Table.AddColumn(#"Filtered Rows", "ColCount", each Table.ColumnCount([Data])),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Data2", each let
RenameCols = Table.RenameColumns([Data],{{"Column1", "Temp1"}, {"Column2", "Temp2"}}),
UpdateCol1 = try Table.AddColumn(RenameCols, "Column1", each if [Temp1] = null then [Temp2] else [Temp1]),
UpdateCol2 = Table.AddColumn(UpdateCol1, "Column2", each if [Temp2] = [Column1] then null else [Temp2]),
DelCols = Table.RemoveColumns(UpdateCol2,{"Temp1", "Temp2"})
in
DelCols)
in
#"Added Custom1"
I will inquire with the eDNA team for the course resource pack. Thanks for reporting back
As for tables with 2 and sometimes 1 column, there is a set of table functions in M that allow for a MissingField.Type parameter, select one of those. These are the options:
Name
Value
Description
MissingField.Error
0
Indicates that missing fields should result in an error. (This is the default value.)
MissingField.Ignore
1
Indicates that missing fields should be ignored.
MissingField.UseNull
2
Indicates that missing fields should be included as null values.
.
If that isn’t possible, are you able to generate a list with column names that you should get (at most)?
Hi @AllisterB you can now download the resources of the Convert a PDF to a Tabular Format video mentioned by @Melissa, along with all other resources of the course (if needed).
This course was so very helpful. once you work it through a couple of times you get to feel as though you are learning techniques that will be useful in solving other issues - thank you Melissa