How to extract tables from Powerpoint using just Power Query

The organization that I work for produces a lot of Powerpoint presentations.  Sometimes they can be the sole source of data for certain workflows.

2016-06-13 12_30_15-Sample.pptx - PowerPoint

So, I needed to develop a solution to extract the data from a Powerpoint presentation.

In order to achieve this using Power Query, I leveraged my previous solution for extracting tables from Microsoft Word documents as the internal formats are very similar.  If you refer to that post, you will find some additional information regarding how the code below works.

There are a few key things to note:

    1. I’ve adopted Mark White’s UnzipContents function in preference to my own.  His is an improved solution.  Note, I’ve made some minor changes to it by adding some rough comments and adding a facility to extract a specified file from a zip.
    2. Powerpoint slide XML uses <a:tbl> whereas Word document XML uses <w:tbl>
    3. I’ve applied PromoteHeaders and RemoveColumns to unexpanded tables, not strictly necessary but an approach I have had to use in other situations.
    4. In order to get at the actual table in the output, you will need to click on one of the tables located.

To use this function you will need to write a Query like this:

You can expect an output like this:

2016-06-13 12_25_49-Query1 - Query Editor

And then if you click on one of the tables within the Clean Table column, an output such as this:

2016-06-13 12_27_36-Query1 - Query Editor

You will need to create a Query called ExtractTablesFromPowerpoint with the following content:

You will also need a Query called UnzipContents, following is my slightly modified and roughly commented version of Mark’s code (any errors or misrepresentation are mine):

Let me know if you run into any issues or have any questions about the approach.

Any feedback is very welcome.

Here is a workbook and a sample file to make applying this easier:

Extract Powerpoint Table v3

Sample