Please note javascript is required for full website functionality.


Power Query: Linking Up

26 December 2018

Welcome to our Power Query blog. This week, I look at how to view links in a webpage.

Unlike my usual Power Query blog, I am going to look at something that is impossible in Power Query (at the moment). Well, it is and it isn’t. Today’s Power Query lives in the Power BI application, and some M functions are available from Power BI, but not yet supported when running Power Query from the ‘Get & Transform’ section of the ‘Data’ tab. If I need to work with webpages, then running Power Query from Power BI is very tempting.

To access queries from Power BI, I use the ‘Edit Queries’ option:

Once in the Power Query Editor, I can choose to create a query from ‘New Source’ in the ‘New Query’ section on the ‘Home’ tab. I will create a new Blank Query:

In my query, I am going to use two M functions that are not available to me when I run Power Query from Excel.  The first is:

Web.BrowserContents(url as text, optional options as nullable record) as text

This returns the HTML for the specified url, as viewed by a web browser.  An optional record parameter, options, may be provided to specify additional properties. 

The second is:

Html.Table(html as any, columnNameSelectorPairs as list, optional options as nullable record) as table

This returns a table containing the results of running the specified columnNameSelectorPairs against the provided html.  An optional record parameter, options, may be provided to specify additional properties. 

I am going to combine these functions.  To show what each one does, I will use Web.BrowserContents() first.

By using

= Web.BrowserContents("")

I can see the contents of the famous website,  Next, I need a step that will extract any links from this text:

This has been achieved using M code

= Html.Table(Source, {{"Link", "a[href^=""http""]", each [Attributes][href]}})

I don’t claim to have an encyclopaedic knowledge of the CSS selectors needed for this functionality – but a list of CSS selectors can be retrieved with a quick search online.  This particular code searches all the <a> elements (as shown on the WebBrowserContents() example screen), where the href attribute begins ‘http’ (and so calls a link).

I made this brief foray into Power BI territory because these functions are useful tools, which I hope to see in Excel’s Power Query / Get & Transform very soon (please?).

Come back next time for more ways to use Power Query!