Please note javascript is required for full website functionality.


Power Query: PDF Pandemonium – Part 3

29 September 2021

Welcome to our Power Query blog. This week, I continue transforming some data that is coming in from a PDF file.

The tent business is doing well, and the UK division have plans to expand the workforce. I have a PDF file, and it contains tables for 10 stores. Last week, I created two Reference Queries, and this week I will continue transforming Pay Scales.

I start by keeping the columns I will be needing for this table.

This means I can concentrate on the data I need to transform for this table.

Having checked the data, I actually only need the pay scale columns, so I select them whilst holding down the CRTL key and click on ‘Remove Other Columns’.

Power Query incorporates this into the existing ‘Remove Other Columns’ step.

I can remove empty rows from the Home tab.

I can now look at how to transform my data from this into a useful table.

I want to transpose the data, but if I do this with some of the data I need in the column headings, I will lose it. First, I need to demote the column headings so that I have the information in a row. I can do this from the Home tab.

This creates a ‘Change Type’ step which I delete as I am not ready to decide column types yet. I am now ready to transpose my data, using the option on the Transform tab.

My data is starting to take shape.

I can rename the headings.

I want to show a start and end salary, rather than have the information in one column. I can split the Salary column from the Transform tab.

From the dropdown, I choose to split ‘By Delimiter’; this brings up a dialog.

I choose to split by space at each occurrence of a space, because this will give me a column with the lower and upper limit.

I delete the automated ‘Changed Type’ step again. I don’t need Salary.2, so I can remove this. As usual, I do this by selecting the columns I want to keep and ‘Remove Other Columns’.

I want Salary.1 and Salary.2 to be numeric columns, so I need to remove the £ signs. I can do this by selecting the columns and replacing £ with blank. I start by using ‘Replace Values’ on the Transform tab.

This provides a dialog where I can enter the details.

This will remove the £ signs.

Next, I change both columns to whole numbers. I can do this from the Home tab or the Transform tab, or by using the right-click menu, and changing the data type.

I get an error, but I can use ‘Replace Values’ again; this time, I choose ‘Replace Errors’.

I want to replace it with null, not zero [0], since zero is the starting point for ‘Pay Scale A’.

Next, I need to transform the Percentage Increase column. If I make it the data type Percentage, I will get values of over 100: I need to divide the values by 100 first, which I can do from the Transform tab, but first I change the data type to ‘Decimal Number’. This will allow me to access the ‘Standard’ dropdown.

I choose ‘Divide’ and enter 100 in the dialog.

I can now set the correct data types for all of the columns.

I rename the salary columns, and my table is ready to ‘Close & Load’.

Next time, I will look at the Stores query.

Come back next time for more ways to use Power Query!