Power Query: Portable Document Format
18 September 2019
Welcome to our Power Query blog. Today, I head off to Power Query in Power BI to look at extracting data from PDF files.
Whilst the list of sources available in Excel Power Query has not been increased recently, there are more possibilities in Power BI.
The source I am looking at today is PDF.
Having navigated to a particular (publicly available) PDF file containing details of accounts, I have a number of options.
The first table gives me some information about types of account, which looks like it would be easy to clean up.
However, the table only gives me one part of the information available in the PDF; the pages should contain all the data in the tables (Table01 to Table04 are on Page001, and Table05 and Table06 are on Page002).
I want as much data as possible from the PDF, so I select both tables and click OK.
This creates two queries, one for available accounts (Page001), and one for accounts no longer accessible for new customers (Page002) (whatever that means!).
I want to have all my data in one query, so I choose to append Page002 to Page001.
This has recreated the PDF in my query, and I can now clean up the data and use it in reports and dashboards to help with my analysis. I can’t currently extract from PDF files in Excel Power Query, but it’s something I’d definitely like to see, as many banking apps now export data to PDF files. Pretty soon, let’s hope.
Come back next time for more ways to use Power Query!