Custom web proxies and Power Query – decrypting Google Chrome cookies

It turns out that decrypting Google Chrome cookies is a non-trivial process, so I thought I’d use it as an introduction to the use of custom web proxies.

There are functions available in the .NET framework that allow you to decrypt Google Chrome cookies, however, they are not directly accessible via Power Query (as far as I can tell) and even with VBA you would need to write a COM wrapper.

Instead, I decided to write a very short script in Python which would decrypt the cookies and send the result back to Excel/Power Query.  How to do that?  Well, by calling the running Python script from a Power Query Web contents query.

The solution below has two parts, a Python web server and Power Query.  Power Query imports the Google cookies database.  It then executes a Web query to contact the Python web server and send it a list of encrypted cookies.  The Python script receives the encrypted cookies, decrypts them and send back the result.  Power Query takes the decrypted cookies and combines it with the other data it has for the cookies.

To make this work, you need to have Python (both 2.x and 3.x should work) installed.  You’ll also need the Python extensions for Windows (for the win32crypt library) and the bottle web framework.

I won’t cover the installation of these here, but happy to assist if you run into trouble.

So basically, this short script starts a tiny local web server.

Assuming you run it from your desktop PC it will respond to requests sent to http://localhost:8080/decryptCookie

And if we send it a list of encrypted cookies, it will decrypt them using Microsoft’s libraries and send back the decrypted cookies.

For the purposes of this exercise, we will manually launch it, but we could trigger it in a more sophisticated fashion directly from Excel.

The M code on the Power Query side of things, builds on the post from yesterday which showed how to access the Google Chrome cookie SQLite database.

I did run into a strange issue where the query I put together yesterday was truncating the encrypted cookie field from the cookies database.   For some reason, this was corrected by adding a SQL query – select * from cookies rather than simply connecting to the table.  I would have expected this to produce identical results but it didn’t.

And here is the annotated M query:

Happy to walkthrough this in more detail if necessary or respond to questions.

This data can now be used much like the “harvested” Internet Explorer cookies from other posts.

Bear in mind, that there are countless other uses for custom Web proxies with Power Query. For example, you can use them to perform more complex web authentication, web scraping with BeautifulSoup (no more crazy HTML/XML guesswork), data manipulation using Python libraries such as pandas, execute other applications on your computer,  etc. etc.

Of course, on the flipside, you need to have Python installed and have the Python script running before it will work.  On your home PC, this probably doesn’t matter.  In a corporate environment, you would want this to be hosted on a proper web server.

How to access an SQLite database from Power Query

SQLite is an increasingly common database format for both mobile applications and small desktop applications.  Many applications on iOS/Android use them and Google/Mozilla use them for their browsers.

Reading an SQLite database using Power Query isn’t difficult but does require a few initial steps.

To begin with, you need to install an ODBC SQLite3 driver.

I recommend using the one located here:

http://www.ch-werner.de/sqliteodbc/

 

2016-02-01 11_47_35-SQLite ODBC Driver - Internet Explorer

Make sure that you install the correct version – either 32 bit or 64 bit.

It would also be a good idea to close Excel and re-open before going any further.

Now we create an ODBC query, by selecting From Other Sources->From ODBC:

2016-02-01 11_37_38-chrome cookies.xlsx - Excel

The next step is to specify a connection string.

Here’s an example that will read the Google Chrome Cookies database (assuming you have Chrome installed and specify your username):

DRIVER=SQLite3 ODBC Driver;Database=C:\Users\<your username>\AppData\Local\Google\Chrome\User Data\Default\Cookies;LongNames=0;Timeout=1000;NoTXN=0;

You will then get asked what credentials to use.  In our case, we just want the default of no credentials:

2016-02-01 11_43_28-

You will then see a dialog box asking you to select a table.  For our example, select the “cookies” table

2016-02-01 11_56_42-

and – hopefully – you will see a table containing all the cookies from Google Chrome:2016-02-01 12_14_49-cookies - Query Editor

You will note that most of them are encrypted and therefore not immediately available for us to use.

That’s a problem for another day :)

Although this example used the Google Chrome Cookies sqlite3 database, this approach should work with minimal change for most sqlite3 databases.

 

 

Using captured Cookies in Power Query

Unfortunately, Power Query often does not reliably maintain the cookies needed to parse data from complex sites. However, it is possible to sidestep this issue by passing through the cookies manually.

In a previous post, I demonstrated how to parse Internet Explorer cookie files and bring them into Excel.

In this post, we’ll transform those cookies into a cookie string and add to a query.

Cookies need to be specified in the following format:

Cookie1=value1; Cookie2=value2; Cookie3=value3;

This is quite different from the table of cookies that we had at the conclusion of my previous post.

But we can use Power Query to fix that.

Start with a new Blank Query and refer to the #”IE Cookies” query that we had at the end of the previous exercise.

Source = #"IE Cookies"

Then filter to just the cookies needed.

Do this by filtering the Host/path column for those cookies corresponding with the domain of the website that you are parsing e.g. flipboard.com.

(NB. This is only going to work for sites that maintain cookies between sessions.  Often these sites will have a “Remember Me” box that you can tick.)

2016-01-31 10_07_20-Query1 - Query Editor

Having done this, remove other columns except for Cookie Name and Cookie value.

2016-01-31 10_07_59-Query1 - Query Editor

Now to convert our table into a cookie string for our query:

Step 1:

Merge the columns using “=” as our separator and remove other columns

2016-01-31 10_12_02-

 

Resulting in:

2016-01-31 10_08_58-Query1 - Query Editor

Step 2:

Transpose the remaining Merged column into a single row

Step 3

Merge these columns using “; ” (note the space) as the separator and remove other columns

2016-01-31 10_12_54-Merged - Query Editor

Step 4

Drill down:

2016-01-31 10_10_59-Merged - Query Editor

So now we have a cookie text string in the format we’ll need for a query, using cookies that we have harvested from Internet Explorer.

Using the cookies in an actual Web query is quite straight forward. e.g.

In the next post, I’ll demonstrate how to use this with a real web site.

How to parse Internet Explorer cookies using Power Query

One valuable approach when dealing with complex authentication schemes is to use the cookies that your browser has already captured.

This post will concentrate on Internet Explorer.  I will do future posts on Chrome and Firefox as they are a bit different.

Under Windows 7, IE stores its cookies as a series of text files in this directory:

C:\Users\<username>\AppData\Roaming\Microsoft\Windows\Cookies\

Windows 8 and Windows 8.1 store the cookies here:.

C:\Users\<username>\AppData\Local\Microsoft\Windows\INetCookies

I haven’t installed Windows 10 yet and can’t find a definitive answer on the web regarding where it stores its cookies, although I suspect the file format is identical.

Let’s start by writing a Folder Query and pointing it at that folder.

Power Query->From File->From Folder

2016-01-15 15_38_24-IE Cookies - Query Editor

There can be other files in this folder.  So it is best to filter any files that do not have an extension of .txt.  You can do this by clicking on the Filter button for the Extension column and unselecting anything that isn’t .txt.  I had some .dat files in my folder,

We only need the Name and Folder Path columns.  Remove the others by selecting Name and Folder Path and using the Remove Other Columns button from the ribbon.

Finally, merge Name and Folder Path into one column using Transform->Merge Columns.

The result should look like this:

2016-01-15 15_46_31-IE Cookies - Query Editor

Now we need to write a function  which will take each cookie file and translate into a combined spreadsheet.

Let’s pull in one of the files and see what it looks like.

Power Query-> From File-> From Text

Enter the appropriate folder e.g.

C:\Users\<username>\AppData\Roaming\Microsoft\Windows\Cookies\

and select one of the files.

2016-01-15 15_57_17-0YZW00TD - Query Editor

 

We’ve got are a series of stacked tables with 9 rows each.  We need to turn those into a more regular format with columns and rows.  There are a number of methods for transposing these in Power Query, but let’s use the method that Ken Puls and Miguel Escobar describe in their book “M is for Data Monkey“.

There are 5 steps –

  1. Add an index column starting at 0
  2. Add a custom column that returns a modulo of the index field by the number of rows in each table.  In our case – 9.
  3. Then carefully Pivot by this field selecting Do Not Aggregate in the advanced options.
  4. Fill upwards
  5. Remove null entries in your first column

I refer you to Ken and Miguel’s book for a detailed explanation of this approach.  However, the code looks like this:

A quick bit of web research explains what each of these fields actually represents:

  1. Cookie name
  2. Cookie value
  3. Issuing Web site
  4. Flags
  5. Expiration Time (most significant)
  6. Expiration Time (least significant)
  7. Creation Time (most significant)
  8. Creation Time (least significant)
  9. Record delimiter

The date/time fields require a bit of work to be useful.

The two fields for Expiration Date need to be combined in order to represent a complete date field in Microsoft’s FILETIME format,  We can do this with the following formula:

([#"Expiration time (high)"] * Number.Power(2,32)) + [#"Expiration time (low)"])

We multiply the high value by 2^32 and add the low value.  This returns a value that represents the full datetime.  We can then convert it to a regular datetime using DateTime.FromFileTime

We need to repeat this for the cookie’s creation date.

The Flags fields is a bit mask for various cookie options.  Unfortunately, as far as I can tell, Power Query doesn’t provide a simple way of converting the integer value into a bitmask, I may take a look at this at a future date but it is not critical for our purpose.

We now have a Query that can parse a cookie file.

If we turn it into a function we can then use it with our Cookie Folder query to produce a complete spreadsheet of all of our IE cookies,.

We do this by adding a Custom column to our Folder Query which calls our Cookie Parse function.

The end result should look something like this:

2016-01-15 16_13_27-IE Cookies - Query Editor

And here’s the full solution code:

First the qProcessCookie function:

And then the IE Cookies folder query:

In my next post, I’ll describe how you can load your cookies into a query that you send to a web page in Power Query.

This is going to allow us to download data from web sites that have more complex authentication requirements than Power Query can currently handle.

Any comments or questions are welcome.