Getting Data from Google Docs into Excel

With the growing use of Google Docs and other cloud based systems, we face the problem of how to get ‘live’ data from, for example, a Google Docs spreadsheet into Excel. This article pulls together a whole bunch of other work from previous articles on Google Visualization, Excel to jSon Serialization and various other related topics, and will end up with a downloable tool to get data right out of google docs.

 

Best Microsoft MCTS Training – Microsoft MCITP Training at Certkingdom.com

Getting data out of Google Docs into Excel

In previous related articles such as Excel-jSon conversion, Google Visualization from Excel, and abstracting data in Excel, I began to look at how to get better integration between web based and pc based data. In this article, we’ll bring together all that work and provide a capability to get data live right out of Google Docs, or Google Gadgets, and into Excel. We’ll end up with a downloadble set of reusable classes and an example application all of which you can download for your own use.
Google Wire Protocol

Google has implemented something they call the ‘wire protocol’ to provide data feeds between Google Docs, Gadgets and so on. It’s jSon-like in fomat, and is used to describe data tables, and would be provided as an https: response to an request to an appropriately structured https: request. For this article we will use some data in this GoogleDocs Spreadsheet, which I have been using to demonstrate Roadmap Generation both in VBA and javascript versions.

Data Source URL
In Google Docs, if you create a table gadget for example, examining the gadget’s properties will give you the data source URL that will provoke a wire protocol response. In the case of the example sheet, you will see that the URL given on the table gadget is

https://spreadsheets.google.com/a/ mcpher.com/spreadsheet/tq?range=A1:H14&key=0At2ExLh4POiZdHBJYnlwaERpYTRZSWl1VEw1TEpQVkE&gid=0

If you enter that into a browser (Chrome or firefox – as usual IE does something strange) , you will see the Google Wire response below

Our task then, is to decode this, and populate an excel sheet on demand with this kind of data. The good thing is that most of the code is available from the work related to the previous articles already mentioned

Data Conversion and Spreadsheet Population
Conversion of jSon to Excel is covered here, however with Google Wire protocol we are not exactly looking at jSon, and neither is the structure a simple two dimensional table that the capability we have already developed is expecting to find. In order to fully utilize all the components that have already been developed, the data needs some preparation.

* Leverage a cBrowser object to deal with the https: request/response.
* Extract out the {table:[]} part of the google wire response
* Convert it to a more jSon friendly ( change the new Date() javascript construct to a proper date, ‘quote’ the key names etc). We’ll use Regular Expressions for that, then deserialize it using Excel JSON conversion
* Crush it down into a 2 dimensional table. We’ll need to write new something for that.
* Use the Data Abstraction cDataSet to populate our Excel sheet.

These classes also use a number of Excel additional references which are already selected in the example cDataSet worksheet you can download. These are the references that are needed.

Implementation example
This is the module that will do the work. Likely a real implementation would provide some kind of dialogue to collect the request URL and the output range , and then execute the data transfer and conversion. In this example, the data in the Google Docs Sheet will be copied to the Excel worksheet name ‘Clone’. If you just want to use this as is, then you are good to go. Just create a module of a similar structure as below. Read on further to find out how it all works, or if you want to harden up the code against data structure errors.

Option Explicit
Const Url = “https://spreadsheets.google.com/a/mcpher.com/spreadsheet/tq?range=A1:H14&key=0At2ExLh4POiZdHBJYnlwaERpYTRZSWl1VEw1TEpQVkE&gid=0”
Public Sub googleWireExample()
Dim dSet As cDataSet, dsClone As cDataSet, jo As cJobject, cb As cBrowser
Dim sWire As String
‘ get the google wire string
Set cb = New cBrowser
sWire = cb.httpGET(Url)
‘ load to a dataset
Set dSet = New cDataSet
With dSet
.populateGoogleWire sWire, Range(“Clone!$a$1”)

If .Where Is Nothing Then
MsgBox (“No data to process”)
Else
‘ it worked

End If
End With
Set dSet = Nothing
Set cb = Nothing
End Sub

The https: request/response
This is rather straightforward at least, using the already available cBrowser object.

Set cb = New cBrowser
sWire = cb.httpGET(Url)

Public Function httpGET(fn As String) As String
pHtml = fn
Dim oHttp As Object
Set oHttp = CreateObject(“Microsoft.XMLHTTP”)
Call oHttp.Open(“GET”, pHtml, False)
Call oHttp.Send(“”)
httpGET = oHttp.ResponseText
Set oHttp = Nothing
End Function

This gets us the google wire string containing the spreadsheet data

The populateGoogleWire method
This is a method of the cDataSet class, and can be used to abstract data from Google Wire Protocol, and dump it to a sheet.

Public Function populateGoogleWire(sWire As String, rstart As Range, Optional wClearContents As Boolean = True) As cDataSet
Dim jo As cJobject, s As String, p As Long, e As Long, joc As cJobject, jc As cJobject, jr As cJobject, cr As cJobject
Dim jt As cJobject, v As Variant, astring As Variant

Const jSTart = “table:”
Const jEnd = “);”

‘ take a google wire string and apply it to a range
p = InStr(1, sWire, jSTart)
e = InStr(1, sWire, jEnd)

If p = 0 Or e = 0 Or p > e Then
MsgBox ” did not find table definition data”
Exit Function
End If
‘ encode the ‘table:’ part to a cjobject
p = p + Len(jSTart)
s = “{” & jSTart & “[” & Mid(sWire, p, e – p – 1) & “]}”
‘ google protocol doesnt have quotes round the key of key value pairs,
‘ and i also need to convert date from javascript syntax new Date()
s = rxReplace(“(new\sDate)(\()(\d+)(,)(\d+)(,)(\d+)(\))”, s, “‘$3/$5/$7′”)
s = rxReplace(“(\w+)(:)”, s, “‘$1’:”)
‘ this should return an object as follow
‘ {table:[ cols:[c:[{id:x,label:x,pattern:x,type:x}] , rows:[ c:[(v:x,f:x}] ]}
Set jo = New cJobject
Set jo = jo.deSerialize(s, eDeserializeGoogleWire)
‘need to convert that to cdataset:[{label:”x”,,,},{},,,]
‘column labels can be extracted then from jo.child(“1.cols.n.label”)  .. where ‘n’= column number

Set joc = New cJobject
Set cr = joc.init(Nothing, cJobName).AddArray
For Each jr In jo.Child(“1.rows”).Children
With cr.add
For Each jc In jo.Child(“1.cols”).Children
Set jt = jr.Child(“c”).Children(jc.ChildIndex)
‘ sometimes there is no “v” if a null value
If Not jt.ChildExists(“v”) Is Nothing Then
Set jt = jt.Child(“v”)
End If
If jc.Child(“type”).toString = “date” Then
‘ month starts at zero in javascript
astring = Split(jt.toString, “/”)
If LBound(astring) <= UBound(astring) Then
v = DateSerial(CInt(astring(0)), CInt(astring(1)) + 1, CInt(astring(2)) + 1)
Else
v = Empty
End If
Else
v = jt.Value
End If
.add jc.Child(“label”).toString, v
Next jc
End With
Next jr
Set populateGoogleWire = populateJSON(joc, rstart, wClearContents)

End Function

Extracting the table[]
As with most of the code in this example, we need to still add a bunch of error handling, so please beef that up if you implement any of these modules. For now we are assuming that the wire string is valid and contains a data table.

Const jSTart = “table:”
Const jEnd = “);”

‘ take a google wire string and apply it to a range
p = InStr(1, sWire, jSTart)
e = InStr(1, sWire, jEnd)

If p = 0 Or e = 0 Or p > e Then
MsgBox ” did not find table definition data”
Exit Function
End If
‘ encode the ‘table:’ part to a cjobject
p = p + Len(jSTart)
s = “{” & jSTart & “[” & Mid(sWire, p, e – p – 1) & “]}”

Cleaning up the jSon
The Wire Protocol needs a bit of work to fix quoting of key names and regularization of date format, but it’s going to be a piece of cake with the regEx capability.

‘ google protocol doesnt have quotes round the key of key value pairs,
‘ and i also need to convert date from javascript syntax new Date()
s = rxReplace(“(new\sDate)(\()(\d+)(,)(\d+)(,)(\d+)(\))”, s, “‘$3/$5/$7′”)
s = rxReplace(“(\w+)(:)”, s, “‘$1’:”)

Deserializing the cleaned up jSon
Now we can get to a cJobject we can work with.

‘ this should return an object as follow
‘ {table:[ cols:[c:[{id:x,label:x,pattern:x,type:x}] , rows:[ c:[(v:x,f:x}] ]}
Set jo = New cJobject
Set jo = jo.deSerialize(s, eDeserializeGoogleWire)

Crush down to 2 dimensional table
As you can see the wire protocol is a bit verbose, and will have been deserialized into a cJobject with too many layers, so our challenge here is now to move to something 2 dimensional that we can represent in Excel.

‘need to convert that to cdataset:[{label:”x”,,,},{},,,]
‘column labels can be extracted then from jo.child(“1.cols.n.label”)  .. where ‘n’= column number

Set joc = New cJobject
Set cr = joc.init(Nothing, cJobName).AddArray
For Each jr In jo.Child(“1.rows”).Children
With cr.add
For Each jc In jo.Child(“1.cols”).Children
Set jt = jr.Child(“c”).Children(jc.ChildIndex)
‘ sometimes there is no “v” if a null value
If Not jt.ChildExists(“v”) Is Nothing Then
Set jt = jt.Child(“v”)
End If
If jc.Child(“type”).toString = “date” Then
‘ month starts at zero in javascript
astring = Split(jt.toString, “/”)
If LBound(astring) <= UBound(astring) Then
v = DateSerial(CInt(astring(0)), CInt(astring(1)) + 1, CInt(astring(2)) + 1)
Else
v = Empty
End If
Else
v = jt.Value
End If
.add jc.Child(“label”).toString, v
Next jc
End With
Next jr

Push abstracted data to the target worksheet
Now that we have a cJobject in the correct structure, we can use the cDataSet.populateJSON (already covered here) method to push the data to the target worksheet

Set populateGoogleWire = populateJSON(joc, rstart, wClearContents)

When serialized this simplified jSon structure now would be

{“cDataSet”:[{“Activate”:””,”Deactivate”:”6/21/2007″, “Description”:”Cellphone”,”ID”:”cell”,”Target”:”smart”,”Custom”:””,”Cost”:”200″,”Callout”:””} ,{“Activate”:”1/7/2006″,”Deactivate”:”3/22/2011″, “Description”:”Mac”,”ID”:”mac”, “Target”:”iPad”,”Custom”:””,”Cost”:”300″,”Callout”:””},{“Activate”:””,”Deactivate”:”3/21/2012″, “Description”:”PC”,”ID”:”pc”,”Target”:”android”,”Custom”:””,”Cost”:”250″,”Callout”:””},{“Activate”:”1/2/2009″, “Deactivate”:”11/7/2011″, “Description”:”Netbook”,”ID”:”net”, “Target”:”pc”,”Custom”:”dislike”, “Cost”:”150″,”Callout”:”this was a waste of money”}, {“Activate”:”1/2/2006″,”Deactivate”:”2/2/2010″, “Description”:”Windows Phone”,”ID”:”smart”,”Target”:”iphone”,”Custom”:””, “Cost”:”240″,”Callout”:”Windows Mobile was so bad”},{“Activate”:”9/3/2010″,”Deactivate”:””, “Description”:”Tablet”,”ID”:”tablet”,”Target”:””,”Custom”:”family”,”Cost”:””,”Callout”:””},{“Activate”:”10/29/2003″,”Deactivate”:”6/13/2008″, “Description”:”MP3 player”,”ID”:”mp3″,”Target”:”ipod”, “Custom”:””,”Cost”:”20″,”Callout”:””},{“Activate”:”10/30/2004″,”Deactivate”:”1/2/2009″, “Description”:”Personal Video”,”ID”:”pv”,”Target”:”ipod”,”Custom”:””,”Cost”:”25″,”Callout”:””}, {“Activate”:”4/12/2008″,”Deactivate”:”6/2/2011″, “Description”:”iPod”,”ID”:”ipod”,”Target”:”ipad”,”Custom”:”like” ,”Cost”:”25″,”Callout”:””},{“Activate”:”1/5/2009″,”Deactivate”:””, “Description”:”iPhone”,”ID”:”iphone”,”Target”:”android”, “Custom”:”like”,”Cost”:”360″,”Callout”:””},{“Activate”:”11/2/2010″,”Deactivate”:””, “Description”:”iPad”,”ID”:”ipad”,”Target”:”tablet”, “Custom”:”like”,”Cost”:”480″,”Callout”:””},{“Activate”:”12/13/2009″,”Deactivate”:””, “Description”:”Android/Chrome”,”ID”:”android”,”Target”:””, “Custom”:””,”Cost”:”400″,”Callout”:””},{“Activate”:”12/3/2005″,”Deactivate”:”12/12/2007″, “Description”:”windows tablet”,”ID”:”wtab”,”Target”:”pc”, “Custom”:”dislike”,”Cost”:””,”Callout”:”this really sucked”} ]}

Summary
You can get all the code of these downloadable tools, which I hope you can find a use for as the basis for your own application. As always I welcome suggestions, improvements, questions and bug fixes at my forum.