Power Query: Cleanse, Tone and Upload
13 December 2017
Welcome to our Power Query blog. This week I look at combining several Power Query functions in order to standardise some incoming data.
For today’s example, I will use my reliable fictional salespeople – but as usual, they have thrown some problems into my data.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image1.png/a5ac893c94494bea8fb98505bfdf8601.jpg)
Derek has helpfully entered the expense codes as I would like to describe them, but Mary and Paul have been less consistent. I would like to change the expense codes to the standard names so that I can calculate totals, but without having to go through and change each entry manually – in a large dataset many people may have used ‘Gas’ instead of ‘Petrol’. First, I need to create a query for my data, which I do on the ‘Data’ tab in the ‘Get and Transform’ section. I opt to create a query ‘From Table’:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image2.png/ad77f09b8b58ff3389f68690fdb33016.jpg)
My data is not yet in a table, so this will be done as part of the query creation process; I am prompted for the boundaries and whether I have titles. I can then create my query.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image3.png/3160b301e133b550b2f33b612c75e9ff.jpg)
From the ‘File’ or ‘Home’ tab, I choose to ‘Close and Load To’, in order to create a connection.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image4.png/3e4c1011650b374796067818bcd08415.jpg)
I don’t want to load data to my workbook at this point, as nothing has yet changed.
To replace the non-standard entries in Expense Code, I need a reference table. This avoids the need to ‘hard code’ anything (if you don’t know what this means ask a programmer and watch them grimace). Therefore, I right click on my query and look at my options:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image5.png/d0fcb493ea5684da91e648cd0500d545.jpg)
I have a ‘Reference’ option, so I choose this. The difference between ‘Duplicate’ and ‘Reference’ is subtle: if I choose ‘Reference’, the new query is based on the results of the first table, rather than replicating it entirely.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image6.png/1195244190bdc55b4cf6a8c67b7c8216.jpg)
Thus, it looks just like my first table – I rename this table to ‘Expense Codes Substitutes’ so that it’s easier to understand its purpose. I remove the Name column as I am only interested in the expense codes for now.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image7.png/c7a64a0306790e8fd6253f5c4f5a2251.jpg)
I have too many similar entries, so in the ‘Reduce Rows’ section, I choose the ‘Remove Rows’ dropdown and select ‘Remove Duplicate Rows’.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image8.png/d7c33a019d4b28a9aeaf2efe14940761.jpg)
My next step will be to decide which cells are standard entries and which are substitute entries. Now that my initial data has been reduced to unique values, I need to go back to Excel to indicate manually which values need to be substituted with a standard value. I close and load to the existing worksheet.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image9.png/74ae724feb87329e3473da8b680080b5.jpg)
Next, I need to add a column to my new table called Substitute and reorganise my data so that Power Query can read what and when to substitute for each entered expense code.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image10.png/f18173f2beb7f42eea726e3b27e9ab8a.jpg)
I have indicated substitutes where appropriate (and deliberately missed one!) I now need to create a new query for my edited table:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image11.png/e30b27e8bacee97a2b68280396745853.jpg)
I call the new query ‘Substitutes’ and create it as connection only (since I have not changed it, I don’t need to load it).
Back in the worksheet, I right-click my original query ‘Table1’ and join it to ‘Substitutes’, using the ‘Merge’ option. Since the data in ‘Substitutes’ came from ‘Table1’ this is known as self-referencing.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image12.png/2be72fa7064ec627f5c4f3e91ab7a66e.jpg)
I select Expense Code from each table and choose the ‘Left Outer’ join option and click ‘OK’.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image13.png/0c72ad280cab2efa67dd92b712a1775d.jpg)
I can now expand my Substitutes column:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image14.png/840d8aa57fef5dddace46a2bd79467e6.jpg)
I only want the Substitute column from the ‘Substitutes’ table:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image15.png/b062a23ceeff47612d3a8b3bbc325b59.jpg)
I can now delete the original Expense Code field and rename Substitute to be the new Expense Code… but wait, Mary has no substitute set up for ‘Sundries’ (see I missed one on purpose!). To show that everything can be refreshed, I continue with my deletion and renaming here, and load my query to the same worksheet.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image16.png/718c328f3d7b6d55a3e3de4964fae636.jpg)
I manually edit my ‘Substitutes’ table to make sure all the substitutes are populated (including ‘Sundries’), and refresh the data using the ‘Refresh All’ option on the ‘Data’ tab.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image17.png/8550c622405066b27f7d4efc4c60f91f.jpg)
My data is now updated with standard expense code names. I will still need to maintain the substitute table, but I have no need to trawl through my data to find any non-standard entries that don’t already appear on my substitute table, as an update will add any unrecognised entries to my substitute table. To prove this, I add an ‘A4 folder’ to Paul’s expenses ready to add it to my table:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image18.png/68dd86ba7bb49b309b4439f337800360.jpg)
Obviously, the update is not quite sure how to treat the extra information in my manual table, but a line has appeared and I need to adjust the data. A new row has appeared for Paul in the final table. I simply update my manual table and refresh again.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2017/power-query/dec/pq-54-image19.png/35b72ac0e2e29390f4a27927a2ce6b5b.jpg)
All the data is now showing the correct expense code, as required.
Want to read more about Power Query? A complete list of all our Power Query blogs can be found here.