Power Query: The Right Combination
14 November 2018
Welcome to our Power Query blog. This week, I look at whether to append or merge when combining data.
When I have separate queries that are very similar or related, I don’t want to waste time cleaning them up individually when I can join them together and deal with a big query in one go. I looked at appending queries in Power Query: Appending Files. What follows is an updated segment of the process (since my original blog was way back in Excel 2013 and Excel 2019 is out now!).
I have a standard expense sheet from one of my imaginary salespeople, Derek, which I plan to clean up ready for reports to be extracted. However, I have a very similar sheet from Mary, so it makes sense to append her sheet and clean them up together.
I append the query containing Mary’s data.
I can repeat this (or append more than one table at a time), as long as my data is similar. In fact, if I have a lot of similar tables in one folder, I can append them all – for more on this, please refer to Power Query: One Query, One Folder. Appending will just keep adding rows. There are no checks for duplication – so If I accidentally add Derek’s sheet twice, then the rows for Derek will simply appear twice.
If I want to add more data that is related, but not similar data to this query – for example, the commission information for my salespeople – then append will not help me.
I can select the ‘Commission’ query, but appending it produces unsatisfactory results:
That’s not what I was looking for. When appending, Power Query will only add data to the same column if the column names are exactly the same. In this case, not only are the column names different, but so is the data. To combine data that is related, but not similar, I need to merge my queries.
In order to merge, Power Query needs to know how to join my data together, i.e. the common link or key. In this case, it is the name of the sales person. This is why I have cleaned up the data in my query to populate each row in the Name column. If I take the default join, the left join, then I will get everything from the first query: the expenses, and the matching rows from the second query, which should give me the commission for Derek and Mary, but not the other salespeople. This is correct, as I don’t need to know the commission for the other salespeople in this particular query. (For more information on joins, please see Power Query: Two (Queries) Become One.)
The data from the ‘Commissions’ query has been added as a table, so I need to expand it using the icon next to the title of the column Commissions:
I only need the ‘Commission’ and the ‘Minimum’ fields (and I definitely don’t need the ‘original name as a prefix’!).
I now have all the data together, allowing me to go on to calculate monthly commission amounts.
- if I want to combine queries that contain similar data, I can use Append, but I must make sure that my column names match
- if I want to add related data, where one or more of the columns matches, and some of the data in those columns can be used as a key to join the data together, then I can use Merge, and pick the best join for my purposes.
In any case, if I find that I have picked append when I shouldn’t, or I chose the wrong join, I can simply delete that step and try again! The data that is extracted to Power Query is still safe and unchanged in its original location.
Come back next time for more ways to use Power Query!