Power BI Blog: Get Data from PDFs
6 December 2018
Welcome to this week’s Power BI blog. Today, we’re going to look at the new preview feature, Get Data from PDFs.
PDFs have long been a source of pain for analysts, with most copy/paste functions seeking to get data from PDFs being highly unreliable, bringing numbers and values in unformatted, or worse, incorrectly formatted patterns, and generally needing a lot of massaging to set up in a reasonable way. This feature should change all of that.
First thing’s first – make sure you enable it in the “Preview features” options menu.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2018/power-bi/pbi-ideas/get-data-from-pdf/image1.png/e774d10cbbb9450fc45efbe51abdf434.jpg)
Now, when you Get Data, under File, you can see the PDF option in Beta:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2018/power-bi/pbi-ideas/get-data-from-pdf/image2.png/f32e5a15e2cf9c3e4d2d058458ce054d.jpg)
Conveniently, our monthly newsletter came out earlier this week, so we’re going to connect to that as our data source. If you don’t have a copy, you can scroll down to the bottom of the page and sign up to receive it!
This is what we see once we connect through to it:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2018/power-bi/pbi-ideas/get-data-from-pdf/image3.png/f1140ff857fc3b6f5f97a6a24f4a6fc7.jpg)
Interestingly, there are far more tables than I would have envisaged. Clicking through them, it looks like it’s converted all of the bullet points into their own table format.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2018/power-bi/pbi-ideas/get-data-from-pdf/image4.png/72aa864d2854c6fefb1083fba0ab5792.jpg)
It just so happens that I know that there is a list of training course dates near the back of the PDF though, so I’m going to skim to page 40 to find that list:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2018/power-bi/pbi-ideas/get-data-from-pdf/image5.png/36776d1da4d05b45bb5a5d09375f407c.jpg)
Editing the data brings it into the Power Query editor, allowing me to promote the first row to be the column headings and bring it into my dataset.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2018/power-bi/pbi-ideas/get-data-from-pdf/image6.png/23912d3b1671861e02bebcd5183f1607.jpg)
All looks pretty straightforward! However, if you go to the Navigation step, you’ll see how it refers to this table in the file:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2018/power-bi/pbi-ideas/get-data-from-pdf/image7.png/6f49c288a0d88a66b427eaf4ece923d6.jpg)
“Table030” isn’t a great way to reference it, especially if this PDF were to be different next month. So far, this looks like a good way to bring an ad-hoc table into Power BI, but it doesn’t quite work for long-term monthly reporting. Oh well, still better than nothing!
Join us next week for more tips and tricks in Power BI!