How to extract Queries from a workbook using just Power Query

Had success today in developing a way of extracting Power Query queries from a workbook using just Power Query:

2016-05-29 12_14_24-Extract Queries v4.xlsx - Excel
(All the query text is in there, but Excel is limited to a row height of 409 which isn’t quite enough here)

I’ve been meaning to look into how query text is stored in Excel workbooks for a while.  I had assumed that they were found somewhere in the data model.

However, I was able to determine that they were actually stored, albeit in a rather convoluted form, in a connections.xml file within the Excel workbook .xlsx file.

Grab any Excel workbook containing Power Query queries and rename it to something .zip.

Then locate the connections.xml file in the xl folder.

If you have a look at it you’ll see what mostly looks like binary data.  But it’s in a particular form that I suspected was Base64.

So I converted the Base64 into binary to see what it was.  Turned out to be a zip file.  So I decompressed that, and hidden away inside was a text file called Section1.m containing all the queries in the workbook!

So – to recap, the text for your queries is stored in a text file in a zip file that has been converted to Base64 and placed inside an XML file that has then been turned into a zip file.  Or something like that.

And here is the code – all in Power Query – that can pull out all the queries from a specified workbook into an Excel table: (Yes – it needs a bit of a clean up!! – but gives you an insight into how I approach things)

Note – I kept running into a weird bug where it would mess up the line containing #”Replaced Value5″ whenever I opened it up in the Advanced Editor.

It would replace it, incorrectly, with:

and then helpfully report an error!

I had to keep replacing it afresh with:

This shouldn’t be an issue if you don’t open up this function in the Advanced Editor.

I also used a version of my DecompressFiles function which I hacked a bit to return the data as text rather than XML.  Sorry, it needs a bit of tidy up too but it did the job:

And to call the ExtractQueries function I had another little query like this:

Now, this should work in most cases, but I haven’t tested heavily.

To make it easier, I’ve uploaded a workbook below pre-loaded with the functions and queries and also the sample file they use.

Let me know how you go!

Workbook with function and queries: Extract Queries v5

Sample file it successfully queries: Compare-Tables

9 thoughts on “How to extract Queries from a workbook using just Power Query”

  1. Hi Ken,
    that’s pretty awesome again!

    Thought I could use it the other way around as well in order to “transplant” translated M-code into files. For example when you need to change your connections in multiple queries and don’t want to edit every single query. But this method seems to have problems when there are multiple queries in one file. It errors on the function “DecompressFiles” with this message: “[DataFormat.Error] Block length does not match with its complement.”
    Any idea on how to solve that?

    1. Hi Imke, Thanks! I’m not sure how practical it is yet, but it was really just a proof of concept. Was lots of fun to do! The error that you are getting means that it isn’t recognising the binary content as a zip file. I would need to take a closer look at your sample queries/data to know exactly what the issue is. It may be that I’m not catering for all scenarios when I parse out the Base64 or maybe something else altogether.

      I have had it working in my own workbooks where there were multiple queries. But I confess I haven’t tested extensively.

      You were planning on taking your output and then using Expression.Evaluate and it failed before you got that far?

      1. Thanks Ken!
        No, it failed when trying to apply your technique to a file with multiple queries.
        As it looks, the problem seems to occur once one of the queries is loaded to the PP-Data-Model.
        But this doesn’t seem to be the only factor. Got another file where it didn’t work, but no load there.
        Some files with multiple queries work fine – so quite a mixture at the moment…

        Do you have any idea where the code for the functions is stored? 🙂

        1. Ah I see. So I suspect the issue is when it is parsing the connections.xml file. It needs to correctly extract the Base64 that is held within the “Extended Properties” attribute. So I think there must be some scenarios that I haven’t catered for. Eg. Other attributes. You could comment out the ExtractQueries parameter list and hard code the Filename (whilst fixing the Results5 quot; issue) and see where it is failing. Or happy to help if you are able to upload the workbooks that are failing.

          I’ll take a look at the issue when the query is loaded to Data Model.

        2. I’m wondering if the issues arises when using Cube formulae. All my other workbooks work fine. And when I look at connections.xml the query base64 is not present at all – for any of the queries. Need to dig deeper.

    1. Thanks Maxim. It seems to work on most of the files I pass to it. But my focus is more Power Query than DAX. When i do try workbooks with heavier use of DAX/Power Pivot they seem to be less likely to work – but plenty do. XLSB – yes, no chance of them working – I’ve never dug into the innards of that format.

  2. Wow….Nice work, Ken 🙂
    One small note… will be better when you replace a part of #”Replaced Value5″ step.
    I mean this
    “"”
    to this
    “&”&”quot;”
    Then you can watch the code in advanced editor without any issue.

    Regards

Leave a Reply

Your email address will not be published. Required fields are marked *