Please note javascript is required for full website functionality.

Blog

Monday Morning Mulling: August Challenge

28 August 2017

Last Friday, we posed perhaps one of the hardest challenges that we’ve had so far: can you create a file that will refresh your Power Query queries, and pop up a message box letting us know when it’s finished, without disabling background refresh (and therefore rendering Excel unusable)?

I hope that you’ve all had a frustrating weekend playing around with this, because this one is a doozy. Our solution here takes advantage of not one, but two separate quirks of Power Query and the way it refreshes data. It’s also particularly inefficient, and we’re hoping that Microsoft will eventually make some tweaks to allow this to be a bit more elegant. Let’s explain.

The first quirk is that refreshing queries will trigger calculation updates in Excel. This means that we can use a query update to trigger a calculation event in VBA. So our thought was to create an extra query that would simply contain a single random number using the Number.Random() function in Power Query. This would create an output table that would contain a number that changes whenever the query is refreshed. We can set up a VBA event to check to see if the number as changed. This required a few steps: set up a public variable that can be updated, assign the variable a starting value when the model is opened (to ensure that it doesn’t pop up a message when any calculation is performed) and then run a check to see whenever the value changes.

Public RandomNumber As Double

 

Private Sub Workbook_Open()

RandomNumber = Range("FinalQuery").Value

End Sub

 

Private Sub Workbook_SheetCalculate(ByVal Sh As Object)

If Range("FinalQuery").Value = RandomNumber Then

    'Do Nothing

Else

    RandomNumber = Range("FinalQuery").Value

    MsgBox "Query Refreshed!  All data is now updated.", vbOKOnly, "Power Query"

End If

End Sub

In principal, this should work.  In practice, this fails because Power Query will go ahead and run all queries that it can simultaneously.  Because the random number query is so simple, it finishes first and triggers the macro before the other queries can be completed.  This brings us to the second quirk that we have to take advantage of.

Those of you who follow Ken Puls may be aware that he recently wrote an article explaining why power query refresh speeds suck.  There are two things to note on this – the first is that Power Query tends to go back to the data source more frequently than it should in order to refresh tables – we’ll come back to this point later.  The second thing isn’t explicitly noted in his article, but it’s nicely highlighted in one of the pictures: queries are run in order of hierarchy – that is, those that rely on others will be delayed until the precedent queries are run.  Queries that are on the same level can be run simultaneously.

So what we can do is create a query that will reference all of the other outputs, create our random number, then get rid of all the underlying data from the other queries, so that all we’re left with is the random number.

Here’s the M code that I’ve used to make it happen:

let

    Source = Table.Combine({#"Output Table", #"Output Table 2", #"Output Table 3"}),

    #"Added Custom" = Table.AddColumn(Source, "RandomNumber", each Number.Random()),

    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"RandomNumber"}),

    #"Kept First Rows" = Table.FirstN(#"Removed Other Columns",1)

in

    #"Kept First Rows"

This creates a table in Excel, containing just a single random number. Here’s where there’s actually yet another quirk to note about query refreshing: for the Calculation event to trigger, the value needs to be a precedent cell in a calculation.

So by doing this, we can force the random number to be the last thing to be calculated, which will result in our message box above appearing only when all queries have been refreshed.  The timing of this might be a little bit random, as it’s linked to how long it takes for the query to run, but it should still trigger when it happens.  Fortunately, if you’re in the middle of entering in a formula in a cell, calculations are intrinsically switched off while you’re editing, so you’re not going to be interrupted with a pop-up until after you finish with your formula.

So, what’s wrong with this?  Well, it goes back to the point earlier about the efficiency of Power Query.  If you think about what happens in this random query, it’s calling upon the three initial queries.  Due to the way that Power Query operates at the moment, it’s actually triggering another call back to the underlying data, which is effectively going to double the refresh time in this instance.  

In the future, I have no doubt that this will be fixed and that the data will be cached into the staging queries accordingly. However, until that time, I can only suggest that you check out this UserVoice request and vote it up – at some point, the Excel team will get around to implementing the much more efficient process that Power BI uses to cache query data at each step.

So there you have it – did you find a better solution? Let us know, we’d love to hear from you!

Newsletter