Please note javascript is required for full website functionality.
MVP

Blog

Power Query: The Delete Dilemma

22 July 2020

Welcome to our Power Query blog.  This week. I look at data from a sheet made up of a number of tables, with a random amount of data that needs to be excluded.

 

Mary, one of my imaginary salespeople, has sent in a slightly unusual report of the tents available from a particular supplier.

I want to put the tent, supplier and report date into a table, and delete the lines at the top (there will be an unknown number of these).  I begin by extracting my data to Power Query, using ‘From Table’ on the ‘Get & Transform’ section of the Data tab.

The ‘Create Table’ dialog box tries to default to the table I happen to be in, but I need to change this to include all of my data; instead of using ‘From Table’, I will adopt another approach.

I select ‘From Workbook’ in the dropdown from the ‘From File’ option on the ‘New Query’ options, and select my current workbook.

I select the sheet.

I can see that all my data has been included; I can now choose to ‘Transform Data’.

Power Query has automated some steps, but I don’t want to include most of them, so I delete all steps after the ‘Navigation’ step.

I need to remove the top rows, but I want to avoid specifying how many of these rows there are.  I plan to delete all rows above the date value. I begin by adding an index from the ‘Add Column’ tab.

Next, I create a duplicate column of Column3 (which contains the report date value) on the ‘Add Column’ tab.

In order to get rid of all the data apart from the date, I convert Column3 – Copy to Data Type ‘Date’.  This may be done from the ‘Transform’ tab.

This means that apart from the date, the data values are now either null or Error.  If I right-click my column, I can replace the errors with null values.

This gives me null values, and I can now fill down by right-clicking, choosing to ‘Fill’ and then choosing to ‘Fill Down’.  The rows above the date do not matter as I want to delete them.

I need to delete the null value rows, so I filter and choose values that are not null.

This gives me the rows with a date value.

I have the date column.  Now, I want to get the supplier into a column. I create another duplicate of Column3:

I convert the Data Type on my new column to ‘Date’ because I want to isolate the value with the date.

I create a new conditional column from the ‘Add Column’ tab.

If the value is not null, then I want to put the supplier from Column2 into my new column, which I call Supplier.  The values with errors will remain as errors. 

I can now right-click on Supplier and replace the errors with null.

Having done this, I can fill down my supplier, as before.

I have one more row to remove, and I can do this by filtering on Column1 to remove null values.

I have one final row to delete, but this time it has values in all columns.

The data I need in the top row has already been moved to columns.  I could just delete the top row, but instead I choose to identify it by creating a new conditional column.

I choose to create a new column which will indicate a row should be deleted if the value in Supplier matches the value in Column2.

I can then filter the Delete Me column to remove everything that isn’t null

I now consider the column headings, but first I want to remove Column3 – Copy1 because the errors might cause problems.  I also remove my other temporary columns at the same time.

I can now promote the first row to headers from the Transform tab.

I just need to tweak the supplier and date headings.

My final step is to ‘Close & Load’ my data into Excel.

If I go back and alter the data by adding some extra rows to be deleted, I can check my transformation.

I create another line at the top and an extra space before the tent data.  Then, I refresh the query.

The extra lines appear in the source.

The data is still transformed correctly.

 

Come back next time for more ways to use Power Query!

Newsletter