How to extract embedded Excel worksheets from PowerPoint using just Power Query

I’ve been using my solution to extract tables from Powerpoint a lot recently.  However, I’ve come across some situations where what looks like a table is actually an embedded Excel worksheet.

This can be quite useful in Powerpoint as it provides the ability to create subtotals and apply other formulae.

You can add these sort of tables by Insert->Table->Excel Spreadsheet or Insert->Object->Microsoft Excel Worksheet or if you Copy and Paste->Embed.

Of course, it also has the side effect of completely breaking my approach for extracting the data using Power Query.

Initially I thought I was going to have to find the reference in the XML, track it through the relationship file and then to the actual embedded workbook.  However, I quickly realised that there wasn’t much point.

The embedded worksheets always end up in the same folder within the zip file – ppt\embeddings –  so we can just grab them from there.

The code then becomes pretty simple.

  1. Unzip the .pptx file using the UnzipContents function from previous postings
  2. Filter for just files within the embedded folder
  3. Tell PowerQuery to treat the content as an Excel Workbook.

Unfortunately, it didn’t work.

It turns out that, for reasons unclear to me, Powerpoint doesn’t compress the content of embedded Excel worksheets.  This means that when the UnzipContents function tries to decompress them it fails and produces null instead.  I’ve noticed it seems to apply the same approach to embedded images such as JPGs.

There was a quick fix to this though.  I made a slight modification to the UnzipContents function and told it to return the raw binary data instead of null when it fails to decompress.

with:

And voila, PowerQuery was able to open it as an Excel workbook, confirming that the content wasn’t compressed.

The code then becomes:

ExtractEmbeddedWorksheetsFromPowerpoint:

And you can call it from a Query like this:

This should yield a result like this:

2016-08-06 14_36_51-Query1 (2) - Query Editor

From here you can select the table you are interested in and drilldown the sheet:

2016-08-06 14_40_33-Query1 (2) - Query Editor

Here’s a sample and a workbook with a working example:

Sample2

Extract Powerpoint embedded v2

How to extract tables from Powerpoint using just Power Query

The organization that I work for produces a lot of Powerpoint presentations.  Sometimes they can be the sole source of data for certain workflows.

2016-06-13 12_30_15-Sample.pptx - PowerPoint

So, I needed to develop a solution to extract the data from a Powerpoint presentation.

In order to achieve this using Power Query, I leveraged my previous solution for extracting tables from Microsoft Word documents as the internal formats are very similar.  If you refer to that post, you will find some additional information regarding how the code below works.

There are a few key things to note:

    1. I’ve adopted Mark White’s UnzipContents function in preference to my own.  His is an improved solution.  Note, I’ve made some minor changes to it by adding some rough comments and adding a facility to extract a specified file from a zip.
    2. Powerpoint slide XML uses <a:tbl> whereas Word document XML uses <w:tbl>
    3. I’ve applied PromoteHeaders and RemoveColumns to unexpanded tables, not strictly necessary but an approach I have had to use in other situations.
    4. In order to get at the actual table in the output, you will need to click on one of the tables located.

To use this function you will need to write a Query like this:

You can expect an output like this:

2016-06-13 12_25_49-Query1 - Query Editor

And then if you click on one of the tables within the Clean Table column, an output such as this:

2016-06-13 12_27_36-Query1 - Query Editor

You will need to create a Query called ExtractTablesFromPowerpoint with the following content:

You will also need a Query called UnzipContents, following is my slightly modified and roughly commented version of Mark’s code (any errors or misrepresentation are mine):

Let me know if you run into any issues or have any questions about the approach.

Any feedback is very welcome.

Here is a workbook and a sample file to make applying this easier:

Extract Powerpoint Table v3

Sample

How to extract Queries from a workbook using just Power Query

Had success today in developing a way of extracting Power Query queries from a workbook using just Power Query:

2016-05-29 12_14_24-Extract Queries v4.xlsx - Excel
(All the query text is in there, but Excel is limited to a row height of 409 which isn’t quite enough here)

I’ve been meaning to look into how query text is stored in Excel workbooks for a while.  I had assumed that they were found somewhere in the data model.

However, I was able to determine that they were actually stored, albeit in a rather convoluted form, in a connections.xml file within the Excel workbook .xlsx file.

Grab any Excel workbook containing Power Query queries and rename it to something .zip.

Then locate the connections.xml file in the xl folder.

If you have a look at it you’ll see what mostly looks like binary data.  But it’s in a particular form that I suspected was Base64.

So I converted the Base64 into binary to see what it was.  Turned out to be a zip file.  So I decompressed that, and hidden away inside was a text file called Section1.m containing all the queries in the workbook!

So – to recap, the text for your queries is stored in a text file in a zip file that has been converted to Base64 and placed inside an XML file that has then been turned into a zip file.  Or something like that.

And here is the code – all in Power Query – that can pull out all the queries from a specified workbook into an Excel table: (Yes – it needs a bit of a clean up!! – but gives you an insight into how I approach things)

Note – I kept running into a weird bug where it would mess up the line containing #”Replaced Value5″ whenever I opened it up in the Advanced Editor.

It would replace it, incorrectly, with:

and then helpfully report an error!

I had to keep replacing it afresh with:

This shouldn’t be an issue if you don’t open up this function in the Advanced Editor.

I also used a version of my DecompressFiles function which I hacked a bit to return the data as text rather than XML.  Sorry, it needs a bit of tidy up too but it did the job:

And to call the ExtractQueries function I had another little query like this:

Now, this should work in most cases, but I haven’t tested heavily.

To make it easier, I’ve uploaded a workbook below pre-loaded with the functions and queries and also the sample file they use.

Let me know how you go!

Workbook with function and queries: Extract Queries v5

Sample file it successfully queries: Compare-Tables

Power Query – Converting a hexadecimal number to decimal

There was an excellent blog post by Greg Deckler a few weeks back entitled Using Recursion to Solve Hex to Decimal Conversion. I thoroughly recommend reading to get some additional context.

However, I thought I’d try an alternative approach to converting hex to decimal that does not use recursion.

Converting a hexadecimal value into decimal in Power Query is actually pretty straightforward, for certain test cases.

For example, you can write:

And this will return the correct answer of 32154.

Which is great – but of course, it will only work for hex value comprising 4 characters.

So I’ve produced a more generalised solution:

Let me know if you would like a walkthrough of exactly how this works.

How to use Youtube API from Power Query – Part 3a (Mini Post)

Unfortunately, I’m not able to produce a full blog post today.  However, I’ve had a number of people requesting information regarding how to get statistics for multiple Youtube videos at once.

The basic principle is to send a list of video ids to the Youtube API separated by commas.

Below I have produced some “M” code that combines the output of my two previous blog posts.  That is, it –

1) Looks for the top 50 most popular videos
2) Produces a list of the video ids for these videos
3) Requests the statistics for these videos
4) Combines the original table with the statistics

I will do a full detailed walkthrough of how to produce this code in a future post.  But, if you’re feeling adventurous, create a Blank Query and copy and paste the below into the Advanced Editor.  You will need to copy your API key into two locations in the code.

With any luck, you should end up with output looking similar to this:

2016-05-15 12_29_03-youtube v2.xlsx - Excel

Let me know how you go. And if you have any issues don’t hesitate to ask.

I’ll try to get a full blog post up on this as soon as I can.

How to use Youtube API from Power Query – Part 2

This post follows on from my previous post regarding the use of the Youtube API from Power Query.

You will need to follow the steps from the previous post to ensure that you have a browser API key to the YouTube Data API v3.

In this post, we’ll be exploring how to get the number of views, likes, dislikes and comments for a video.

To achieve this we will use a videos query rather than a search query.

I’ve picked a video at random:

Its video id can be taken from the URL and I have highlighted it in bold below:

https://www.youtube.com/watch?v=MtN1YnoL46Q

You can easily get this value for any video on YouTube.

Now, we can get into Excel and Power Query.

Start with a Blank Query and paste the following into the Advanced Editor:

Expected result:
2016-05-01 11_00_08-Query3 - Query Editor

This query requests the statistics for the Duck Song video (which has the id of MtN1YnoL46Q).

You can review the Youtube API documentation for more detailed information.

You could actually enter this directly into your browser if you wanted to. It is equivalent to:

https://www.googleapis.com/youtube/v3/videos?key=<your api key here>&id=MtN1YnoL46Q&part=id,statistics

Now, if you click on the items ListPower Query will return a list of the results as records.

2016-05-01 11_06_35-Query3 - Query Editor

They will be easier to work with as a Table so next we’ll convert to a Table.

2016-01-24 11_42_07-Youtube Search List Merge - Query Editor

Returning this:

2016-05-01 11_10_13-Query3 - Query Editor

If you then expand out the column, you can start to see all of our data:

2016-05-01 11_11_05-Query3 - Query EditorThen expand the statistics column and here we go – all the information we wanted regarding views, likes, dislikes and comments:


2016-05-01 11_12_12-Query3 - Query Editor

So at time of writing, this video has been viewed by 212 million people! 731,916 liked it, 95,318 disliked it and it has 197,897 comments.

Here’s all the code:

It is relatively easily to request statistics for multiple videos/channels at once. Can cover that in a future post if there is interest.

Please let me know if you encounter any issues or need further clarification.

Enjoy!

Power Query, Proxies – Parsing tables from Microsoft Word documents

Our proposal letters are Microsoft Word documents which include embedded pricing tables.  It is useful for me to have this information in Excel.

I’m going to explore an approach for pulling tables from Word documents into Excel using the small Python web server used in previous posts and Power Query.

As per previous post, you need to have Python (both 2.x and 3.x should work) installed.  This time, we’ll be using the python-docx library to do the heavy lifting for us.  Of course, will also need the bottle web framework.

I won’t cover the installation of these here, but happy to assist if you run into trouble.

Here’s the code for the Python web server:

This Python code will run a small web server that listens on port 8080 and will respond to requests sent to http://localhost:8080/extractWordTables

It is expecting to be sent a query string with a Word document’s filename e.g. http://localhost:8080/extractWordTables?filename=myworddocument.docx

Once it has a Word document it attempts to parse it and look for tables.  To do this it uses Steve Canny’s fantastic python-docx library.

In using it, the process steps through the tables, row by row, cell by cell and converts the data into an HTML document containing one or more tables.

To use this from Power Query you need to execute a Power Query -> From Web query.  Then enter something like:

http://localhost:8080/extractWordTables?filename=WordDocWithTables.docx

You may need to tell it to convert the content to a Web Page, but you should end up with M code looking something like this:

You can then select the table that you want to work further on.

I have adapted this approach to work with multiple files, so you could essentially bulk parse a set of Word documents.  However, it relies on the documents containing tables with the same number of columns etc.  I’ve also added additional features to the Web server to allow me to filter what tables are returned based on the column headings.

Can provide further detail if there is interest.

Finally, I’m exploring other options for doing all of this directly in Power Query, but I do not have a working solution just yet.