Reading files stored on google documents

Technical support and scripting issues

Moderators: JRL, Dorian (MJT support)

Post Reply
obfusc88
Pro Scripter
Posts: 88
Joined: Wed Mar 14, 2007 6:22 pm

Reading files stored on google documents

Post by obfusc88 » Tue Jun 22, 2010 7:58 pm

I have an email account on google.

I went to tools for shared documents and saved a text file there. It is owned by me and shared by me, no other settings. File name is RemoteFile.txt. I can view the text document when I am logged in.

I am trying, with no success to use HTTP to copy file contents as a variable to my local system.

Code: Select all

Let>RemoteName=MyName
//Let>[email protected]

Let>RemotePwd=MyPassword

Let>RemoteURL=https://docs.google.com/document/edit?id=1pmZKqIV9nt-Irm16WoUXyjEKQhX0oEceef8Fpn4NPoA&hl=en#

//Let>RemoteURL=Let>RemoteURL=https://%RemoteUser%:%RemotePwd%@docs.google.com/document/edit?id=1pmZKqIV9nt-Irm16WoUXyjEKQhX0oEceef8Fpn4NPoA&hl=en&pli=1

HTTPRequest>%RemoteURL%,,GET,,RemoteFileText,,
Neither of the lines work for RemoteName or the RemoteURL with/without the name/pwd/

I just get a 404 error for RemoteFileText.
=============================
If I use the browser and just copy the URL, I am brought to a login screen and prompted for the name and password, and then the document opens up for reading. But I cannot get the syntax correct for HTTPRequest/GET. I also tried some variations of FTPGetFile with luck. Really want to use HTTP, Want to work in background and not create any files on local system, just want the variable for parsing and processing.

Suggestions, please?

adroege
Automation Wizard
Posts: 438
Joined: Tue Dec 07, 2004 7:39 pm

Post by adroege » Tue Jun 22, 2010 8:13 pm

I have a theory that the file does NOT exist as a static file which can be requested by a HTTPRequest - but rather that it is "generated" on-the-fly by some sort of database query run on google's servers.

What I would try then is while in Google Documents look for the button over on the extreme right titled "Share". Click this and you should see an option -> "Publish as web page". This should return you a URL that you can then fetch with HTTPRequest.

Here is what Google says when I click the "Publish as web page" button:
You can publish your document to the Internet, where anyone will be able to access and view it online. Your document will be assigned a unique address (URL) on google.com that you can send to your friends and colleagues.

obfusc88
Pro Scripter
Posts: 88
Joined: Wed Mar 14, 2007 6:22 pm

Post by obfusc88 » Tue Jun 22, 2010 9:51 pm

Good suggestion.

I found that Ihad crated the file as a private file, so I first hd to change it for access to anyone with the link. I could not find the button to publish to the web, but the link was provided. I logged out of gmail, entered the link into the URL and the page was there. I checked the source code and could find the text portion of the document.

The link is: https://docs.google.com/document/edit?i ... y=COrdlJoL
You can go there from this posting to see the five test lines.

This script does not work, results in 404 error:

Code: Select all

Let>RemoteURL=https://docs.google.com/document/edit?id=1pmZKqIV9nt-Irm16WoUXyjEKQhX0oEceef8Fpn4NPoA&hl=en&authkey=COrdlJoL

HTTPRequest>%RemoteURL%,,GET,,RemoteFileValue,,
Perhaps it is going through some type of proxy? If Yes, any idea of the extra HTTP values? AHA, I just noticed it is an https prefix, does that prevent HTTPRequest from working?
Last edited by obfusc88 on Tue Jun 22, 2010 10:14 pm, edited 1 time in total.

adroege
Automation Wizard
Posts: 438
Joined: Tue Dec 07, 2004 7:39 pm

Post by adroege » Tue Jun 22, 2010 10:14 pm

Copied this from the help file:

For SSL (https) connections set HTTP_SSL to 1. To use SSL you will first need to install the OpenSSL libraries. See:

http://www.openssl.org/related/binaries.html

obfusc88
Pro Scripter
Posts: 88
Joined: Wed Mar 14, 2007 6:22 pm

Post by obfusc88 » Tue Jun 22, 2010 10:15 pm

Thanks for that. Will try later tonight, sounds promising....

obfusc88
Pro Scripter
Posts: 88
Joined: Wed Mar 14, 2007 6:22 pm

Post by obfusc88 » Tue Jun 22, 2010 11:32 pm

Tried to install OpenSSL, but needed to install Visual c++ 2008 redistributables first. Then installed Open SSL, was prompted to go to PayPal for a $ donation. Did not make a donation at this point, don't know if problem will be resolved. I guess these installations will be required on all machines that use my final script. Plan was for compiled program, will these libraries be included in the compiled script? Will I need to make a donation for each installation?

Anyway, ran script again,

Code: Select all

Let>HTTP_SSL=1
Let>RemoteURL=https://docs.google.com/document/edit?id=1pmZKqIV9nt-Irm16WoUXyjEKQhX0oEceef8Fpn4NPoA&hl=en&authkey=COrdlJoL
HTTPRequest>%RemoteURL%,,GET,,RemoteFileValue,,
Different error this time: 500 HTTP/1.1 500 Internal Server Error
Doing a Google on the error message......

Link: https://docs.google.com/document/edit?i ... y=COrdlJoL

obfusc88
Pro Scripter
Posts: 88
Joined: Wed Mar 14, 2007 6:22 pm

Post by obfusc88 » Wed Jun 23, 2010 12:08 am

Created a loop, waits 3 seconds after error, then tries again. After 25 tries, loop ends. Ran loop three times, got 25 "error 500" failures every time. According to HELP files Timeout is not an issue, there is no Timeout value.

Code: Select all

Let>vCount=0
Label>Start
Let>vCount=%vCount%+1
Let>HTTP_SSL=1
Let>RemoteURL=https://docs.google.com/document/edit?id=1pmZKqIV9nt-Irm16WoUXyjEKQhX0oEceef8Fpn4NPoA&hl=en&authkey=COrdlJoL
HTTPRequest>%RemoteURL%,,GET,,RemoteFileValue,,
MidStr>%RemoteFileValue%,1,3,vCode
If>%vCode%=500,Delay,Done
Label>Delay
Message>Failed %vCount% times
Wait>3
If>%vCount%=25,End,Start

Label>Done
Message>Got the file:%RemoteFileValue%
Label>End
Exit>0
Other suggestions?

adroege
Automation Wizard
Posts: 438
Joined: Tue Dec 07, 2004 7:39 pm

Post by adroege » Wed Jun 23, 2010 1:13 am

What version of Macro Scheduler do you have? I have version 10 and SSL support is not in my version. It was added either in 11 or 12 (not sure which).


Another possibility is that Google is expecting a specific header data value, and HTTPRequest doesn't send anything but the bare basics.
When your browser requests a web page from a server via HTTP, it sends a set of headers with various bits of information about itself. Below you can see the headers sent by your browser.
http://pgl.yoyo.org/http/browser-headers.php
If that's the case, then you will need to automate the web requests using VBScript and the IE object. WebRecorder can help, or you can lookup examples here on the forum and in the "scripts and tips" archive.

Or look into the other option I suggested -- "Publish as web page" from within Google Docs.

obfusc88
Pro Scripter
Posts: 88
Joined: Wed Mar 14, 2007 6:22 pm

Post by obfusc88 » Wed Jun 23, 2010 3:17 am

I am using 11.1.20.

I did finally find the button to Publish as Web Page. The resulting page is http vs. https, but still returns "error 500".

Code: Select all

Let>RemoteURL=http://docs.google.com/document/pub?id=1pmZKqIV9nt-Irm16WoUXyjEKQhX0oEceef8Fpn4NPoA
HTTPRequest>%RemoteURL%,,GET,,RemoteFileValue,,
I will probably have to move from Google as the document storage area, must be able to grab with HTTPRequest.

Thanks again for working on this with me.

adroege
Automation Wizard
Posts: 438
Joined: Tue Dec 07, 2004 7:39 pm

Post by adroege » Wed Jun 23, 2010 9:52 am

This works for me. Give it a try and let us know if it works for you.


***EDIT**** Well I just found it only started working AFTER I had opened the same URL in my browser..... weird

Code: Select all

Let>Remote=http://docs.google.com/document/pub?id=1pmZKqIV9nt-Irm16WoUXyjEKQhX0oEceef8Fpn4NPoA
HTTPRequest>Remote,,GET,,RemoteFileValue,,
MessageModal>RemoteFileValue

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Wed Jun 23, 2010 10:27 am

Try this:

Code: Select all

VBSTART
Function HTTPGet(URL)
  Set IE = CreateObject("InternetExplorer.Application")
  IE.visible = 0
  IE.navigate URL
  do while IE.Busy
  loop
  HTTPGet = IE.document.documentelement.outerhtml
  IE.quit
  Set IE = Nothing
End Function
VBEND

VBEval>HTTPGet("http://docs.google.com/document/pub?id=1pmZKqIV9nt-Irm16WoUXyjEKQhX0oEceef8Fpn4NPoA"),HTMLResult
MessageModal>HTMLResult
Google could be wanting to set a cookie. There's also some javascript. So that may be why HTTPRequest won't work - it could be looking for a real browser. So the above, which uses IE in the background, works.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

obfusc88
Pro Scripter
Posts: 88
Joined: Wed Mar 14, 2007 6:22 pm

Post by obfusc88 » Thu Jun 24, 2010 4:28 pm

Just saw your posting Marcus. I will try it later. Thank you for your suggestion.

obfusc88
Pro Scripter
Posts: 88
Joined: Wed Mar 14, 2007 6:22 pm

Post by obfusc88 » Thu Jun 24, 2010 6:45 pm

Thank you Marcus. The script worked perfectly, got the page into the variable. The only problem I see is that it took over 30 seconds to get the simple page of 5 sample lines. Any way to speed that up?

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Thu Jun 24, 2010 7:28 pm

It took less than a second for me.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

obfusc88
Pro Scripter
Posts: 88
Joined: Wed Mar 14, 2007 6:22 pm

Post by obfusc88 » Sat Jun 26, 2010 12:31 am

Enabled logging.
Ran 12 tests, average time is about 14.4 seconds.
Quickest was 3 seconds, longest was 48 seconds.

Will have to live with it.
Thanks for the solution, that is what really mattered.

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts