Compare two tables or reports using Power Query

25/08/16 – updated to correct minor errors that may stopped this solution from working on your reports 

I receive a lot of reports on a daily, weekly or monthly basis.  Usually the reports are little changed from the previous version but there is no easy way to work out just what has changed.  I suspect that this is a pretty common issue so I’ve written a general solution in Power Query.

This function accepts two tables/reports as input and provides a report of just what has changed.

The functions should work on practically any report and could easily be adapted to compare sheets in separate workbooks, but to demonstrate, I’m going to compare this simple table:

2016-03-20 11_12_33-Compare Tables v12.xlsx - Excel

with this table:

2016-03-20 11_13_00-Compare Tables v12.xlsx - Excel

and produce this output:

2016-03-20 11_13_15-Compare Tables v12.xlsx - Excel

In comparing two tables, there are a 5 kinds of variances we need to look for:

  1. Added rows – rows in second table but not in the first (Chris Masters in our example)
  2. Removed rows – rows in first table but not in the second (eg. Penny Bradley)
  3. Added columns – columns in second table not in the first (eg. Status column)
  4. Removed columns – columns in first table not in the second (eg. Loyalty Rating column)
  5. Changed rows – rows that are in both tables but with one of more changed values

The first 4 variances are fairly easy to detect.

For rows, you can use Table.NestedJoin() to JOIN the tables and, using a joinKind of LeftAnti, return rows that either only exist in one table or only exist in the other.

For columns, you can using the List functions to compare a list of Table.ColumnNames from each table and find those which are unique to one or the other.

Detecting changed fields within rows is a bit more complicated…

The approach I took was to merge the two tables so that each row contained both the old data for the row and the new data for the row.  Then I converted the rows to lists.

I then compared the old value for each field with the new value for each field and used Power Query’s Swiss army knife List.Generate() (which I described in a blog post here) to produce a list of all the changes.  I haven’t commented this in detail, so you may wish to refer to the blog post to get a better sense of what I am doing.

In order to be able to compare the two tables, they need to have a common unique column or primary key.  If this is not the case, you may need to create one by combining two or more columns.  In some cases, it may be OK to use a numeric index field.

The function also accepts a DontCompare parameter. This is a list of fields that you want to exclude from the comparison.  This may include fields that you are not interested in or that always change e.g. Report Date.   In this example I have excluded the Last Updated field.

There is also an AlwaysInclude parameter. This is a list of up to 4 fields that you want to always include in the output.  This is to allow you to see more easily what rows have changed.  In this example, I included both the Customer Id and the Customer’s Last Name.

Here is the commented code for the two functions (the parent function CompareTables that accepts two tables and the child function CompareRows that compares fields in a single row):

CompareTables:

CompareRows:

And here’s an example of how to use:

Please feel free to ask questions or make comments below.

To make this easier to follow, please find a copy of the example workbook here.  Example workbook updated 25/8/16

Compare-Tables – fixed

22 thoughts on “Compare two tables or reports using Power Query”

  1. I so very much appreciate what you have created here. It is just what I so very much need and have not seen anywhere else. However, I can’t seem to get it to work with my data. Your example works just fine for me, but when I make even the smallest change, (e.g. changing the PrimaryKey) I get, “An error occurred in the ‘’ query. Expression.Error: The column ‘Column1’ of the table wasn’t found.”

    Before I send any of my particular parameters, I was wondering if off-the-bat you knew what was causing this. Not sure what the “in the ‘’ query” is referring to.

    Matthew

    1. Hi Matthew, I did recently notice ia minor error in my code which is probably what is causing your problem. I’ll take another look and post a correction.

      1. Thanks tons Ken,

        I have found that as long as I don’t try to change the PrimaryKey, it seems to work fine. For now I am using your “Customer Id” and all is good.

  2. This seems to be sensitive to the order of columns. CompareRows doesn’t find a change in my first column of data in OldData. If my first column in OldData is a field not compared (either because it is excluded or is a key field which can’t have differences) it works.

    I haven’t been able to find the source of the problem. However, if you keep the key field as the first column as per your example, you don’t see the problem.

    Nevertheless, thanks for a very useful solution!

    1. I’d have to take a look Jamie. Your conclusion is probably correct. That said, I did start work on a more general solution to this issue which wrapped an updated version of this code in VBA and allowed comparison of any workbooks/worksheets. From memory it may solve your problem.

  3. Great walk through of the different types of join operations available in Power Query. This is going to save me quite a bit of time and spare me a lot of manual work each month. Thank you!

    1. Thanks Andre. Glad you’ve found it useful. I do have an updated solution that allows you to select files (VBA), deal with upper/lower case, skip rows etc. It’s also plug and play so you can use without looking at the Power Query at all.

  4. Hi Ken
    This is awesome.
    You just saved huge amount of time for me.

    Can you help me with this?
    I want to treat combination of “Firstname.Lastname” as a primary key to check the change of data in columns.
    Is that possible?

    1. Hi Ken
      I modified the script and now i am able to make Firstname and Lastname combination as primary key.

      I am trying to modify script further to display all unchanged items under new status called “new change”. But having difficulty.
      Do you have any suggestions?

      Thanks heaps

Leave a Reply

Your email address will not be published. Required fields are marked *