Unfortunately, Power Query often does not reliably maintain the cookies needed to parse data from complex sites. However, it is possible to sidestep this issue by passing through the cookies manually.
In a previous post, I demonstrated how to parse Internet Explorer cookie files and bring them into Excel.
In this post, we’ll transform those cookies into a cookie string and add to a query.
Cookies need to be specified in the following format:
Cookie1=value1; Cookie2=value2; Cookie3=value3;
This is quite different from the table of cookies that we had at the conclusion of my previous post.
But we can use Power Query to fix that.
Start with a new Blank Query and refer to the #”IE Cookies” query that we had at the end of the previous exercise.
Source = #"IE Cookies"
Then filter to just the cookies needed.
Do this by filtering the Host/path column for those cookies corresponding with the domain of the website that you are parsing e.g. flipboard.com.
(NB. This is only going to work for sites that maintain cookies between sessions. Often these sites will have a “Remember Me” box that you can tick.)
Having done this, remove other columns except for Cookie Name and Cookie value.
Now to convert our table into a cookie string for our query:
Merge the columns using “=” as our separator and remove other columns
Transpose the remaining Merged column into a single row
Merge these columns using “; ” (note the space) as the separator and remove other columns
So now we have a cookie text string in the format we’ll need for a query, using cookies that we have harvested from Internet Explorer.
Using the cookies in an actual Web query is quite straight forward. e.g.
Source = Web.Contents("http://mywebpage.com", [Headers=[Cookie=CookieString]])
In the next post, I’ll demonstrate how to use this with a real web site.