Please note javascript is required for full website functionality.


Power Query: PDF Pandemonium – Part 2

22 September 2021

Welcome to our Power Query blog. This week, I start to transform some data from a PDF file.

The tent business is doing well, and the UK division have plans to expand the workforce. I have a PDF file, and it contains tables for 10 stores. Last week, I imported my data, viz.

The key to making my transformations as immune to change as possible is to keep the data I need rather than delete the data I don’t. Looking at the columns, the easiest way to see if there is any useful data in there is to use the filter icon; Column1 is clearly very useful.

However, Column8 is not:

However, rather than delete Column8, I should keep what I need. On the Home tab, there is an option to ‘Choose Columns’:

I can use this to specify columns I want to keep. It’s much easier than selecting them all for large tables!

I choose to select the first seven [7] columns.

I can see that the heading data from the tables is in Column1, which suggests that transposing my data would be useful. I can do this from the Transform tab.

This swaps the rows and the columns and is much closer to the format I want to see.

I can check the data in my columns again to see which ones I want to keep. However, it is clear that this time the column names will change with the extra text that is present in my source data.

Before I decide which columns to keep, I need some way of identifying them. I will promote the first column to the column headings, which I can do from the Transform Tab.

I choose ‘Use First Row as Headers’:

Power Query has created a ‘Changed Type’ step, but this references column names, so I delete it. I can pick the columns I want to keep in the same way as I did earlier.

I have the data I want to keep, but there are two tables in here: the store data and the pay scales.

I can keep this query, which I will call All Data, and make Reference queries: one for the store table and one for the pay scales table. I can create reference queries from the ‘Home’ tab.

I described the benefits of using reference queries in the blog Reliable References. I call this Reference Query Pay Scales.

I also create another Reference Query, Stores. I will transform Pay Scales next time…

Come back next time for more ways to use Power Query!