Combine, Merge, and Append

I have payroll data files which I want to gather into BI. Originally, I was Combining files but additional columns appeared so Combine started giving me issues. Would Merge be a better fit since the files will not be the same. Probably 18 of 20 columns will be the same. The rows will be different with each file as well. Or, should I just manually create one file in Excel through copy / paste to create a file. I would rather automate thru BI Power Query. I want to be able to take the most recent payroll file each week and add to existing BI file.

I have attached a copy of some of the data in the files and highlighted in yellow some of the columns that get added at various times.

Thanks for any recommendations.

Richard

Merge Jan.xlsx (10.2 KB)
Merge Feb.xlsx (10.0 KB)

Hi,
There is not 1 recipe for this type of data pollution, ranging from bespoke scripts for one situation to conversion tables for another, but manageable.
May I recommend you get the book “Collect, Combine and Transform Data using PQ by Gil Raviv”
He describes, with examples that you can download, how to handle these anomalies.
Paul

4 Likes

Hi Paul,

Thanks. I actually looked back at several books I bought earlier and that was one of them. I will review.

Richard

HI @RBDport

This video might also be helpful.

Simple Transformations using Query Editor

Thanks and have a good day

Najah

I was watching these videos and thought if you would be able to go through some of these and other videos of this course, it can help you with your requirements.

Planning Data Model

Integrating Tables together

Have a good day

Najah

Hi @RBDport, we’ve noticed that no response has been received from you since the 21st of March. We just want to check if you still need further help with this post? In case there won’t be any activity on it in the next few days, we’ll be tagging this post as Solved. If you have a follow question or concern related to this topic, please remove the Solution tag first by clicking the three dots beside Reply and then untick the checkbox. Thanks!

Hi Najah,

Thanks for the help. I will look at those videos today. This weekend I was reading the book that Gil Raviv wrote (Paul’s suggestion) and that holds some promise.

I will be testing some ideas this afternoon.

Richard

1 Like

Hi @RBDport

Have you tried Append in PQ. It will combine columns with same names and for different column will keep the values not present in other file as null.

Thanks
Ankit J

Hi Ankit,

I will look at Append again. I thought the columns and layout had to be the same for both files. I never know which columns may be added and new columns may be added anywhere in the file. I will test this with some other ideas today.

Thanks,

Richard

Hi @RBDport

Not really if you are using Append in PQ. Only thing that matter is column names shall be same. Try and let m know the output.

Thanks
Ankit J

Hi Ankit,

Trying the APPEND and this may work. Not a problem if columns are added but if some not there I have to adjust the Power Query. No big deal.

Having trouble setting up in Power Query a Payroll History File (after APPEND two files) and want to add a third file say the next Payroll Input File to APPEND to History File automatically. It keeps removing the prior Input File data. Any ideas so I can use Input File that appends to a file to create History.

Payroll History File - Jan and Feb appended
Payroll Input File - with March data, and set up to append to History File it works okay and March is added w Jan and Feb
Payroll Input File - with April data (remove March data in file), and PQ is still set up to append Input File. The March data is removed and replaced with April so History File is incorrect.

Thanks,

Richard

Hi Richard,

Have you tried to append one query at a time and doing the same steps for all three (or more of the tables) again instead of appending all in one step?

Regards

Najah

Hi Najah,

Yes, I could do that but then I have to go into PQ every month to add each APPEND. I would prefer to have PQ automatically do it based on adding the most recent file to all the others like COMBINE would.

Regards,

Richard

Hi @RBDport

What you are looking for is Incremental refresh. I have found this article however need to check how to perform with your scenario.

Few Questions

  1. Do these files also has Year present as there can be files for Jan 2020 and Jan 2021.
  2. Are these files huge or Is there any specific reason you want to do Incremental refresh and not Append them in one go as PBI do in Combine.

Thanks
Ankit J

Hi Ankit,

That wont solve my overall problem.

I was reading the Collect, Combine, and Transform ….book by Gil Raviv suggested by Paul and noted on pg 266 on pitfalls about the MissingField.UseNull argument within Table.RemoveColumns . It helps when my columns are missing but needed and will add nulls. This will help for one issue and I will just use the Combine (as before) instead of Merge or Append. Only issue still is when new columns are added but that will not happen very often and I should know about it. I will just have to add the field manually to Advanced Editor.

Thanks for your help, and everyone else.

Richard

Hi,

In the book there is a chapter, 4-7 that describes how to build a conversion table. This might be the best solution.
Also in cases like this, I create a separate query to show me only the header names. This way I can easily find out if someone plays a trick with me again.
(Table.ColumnNames(table)
Paul

2 Likes

Hi Paul,

I read about the conversion table with names as well, but the column names isn’t really my problem. All my files will have the same column names except for the one added at any time. I will try to see if helps with new columns added.

I saw in the book too about creating a query to check the names like you mention. No one else will be in the files I use but me. But, it was interesting what you can do with PQ and M language when you know more.

Thanks for your help.

Richard

@RBDport - do you ever NEED the new columns that are added?
If not, have you considered selecting all the columns you do need (which I assume will always have the same names) and then deleting all other columns? You should be able to do this even with a single report that doesn’t have any of the additional columns.

If that step is built into your query, you can append the tables without needing to constantly check for new columns.

Hi Heather,

Yes, I may need the new columns as they are either new employees in new states that were added or payroll related deductions/earnings that had not been used previously.

I have the query set up now as you suggest to remove unneeded columns so I will have to add any new columns, if needed, to the group.

Thanks for your help,

Richard

Hi Richard,

A simple ‘Combine Transform’ will do it. See the Screenshot attached.

The only caveat is that the Template file (used in the Parameter) must have all column headings that will ever be required. If you ever have a need for more column headings, just change the Parameter Template file to reflect all column headings required.

Ross

1 Like