Download html from website

Technical support and scripting issues

Moderators: JRL, Dorian (MJT support)

Post Reply
mightycpa
Automation Wizard
Posts: 343
Joined: Mon Jan 12, 2004 4:07 pm
Location: Vienna, VA

Download html from website

Post by mightycpa » Mon Nov 10, 2008 11:06 pm

Hi,

It is me again...

I wonder how to save the html from a page in an IE object. I have some code:

Code: Select all

// Generated by MacroScript WebRecorder 2.05
// Recorded on  Monday, November 10, 2008, at 12:18 PM

// CORRECTED BY ME_AGAIN November 10, 2008

//Move the mouse cursor out of harm's way to avoid causing mouseover events to interrupt
MouseMove>0,0
Let>delay=1
IE_Create>0,IE[0]

IE_Navigate>%IE[0]%,www.sportsinsights.com,r
IE_Wait>%IE[0]%,r
Wait>delay

So, that delivers us to the page just fine, now I want to save the html from this page to a file on my local drive.

I don't see enough info about the IE_* object to know how to do that.

Do you have any ideas?


Tx,

George
"A facility for quotation covers the absence of original thought." - Lord Peter Wimsey

Me_again
Automation Wizard
Posts: 1101
Joined: Fri Jan 07, 2005 5:55 pm
Location: Somewhere else on the planet

Post by Me_again » Tue Nov 11, 2008 12:05 am

You need to look at the HTTPRequest command in the Help, it is designed for this purpose.

Note that not all pages have simple structures such that the text as displayed can be captured.

Also be sure that the automated capture of the site content that you are attempting doesn't violate the terms of use.

mightycpa
Automation Wizard
Posts: 343
Joined: Mon Jan 12, 2004 4:07 pm
Location: Vienna, VA

Post by mightycpa » Tue Nov 11, 2008 12:49 am

Hi,

Thanks, ME_AGAIN. I didn't mean to steal your identity at the beginning of this post.

I guess I didn't explain myself well. The code I included originally opens a browser window, and navigates to whatever page I may direct it to. Eventually, it lands on a target page.

At this point, I've got an open browser window with html in it.

The question is, how can I get this HTML into a file on my hard drive?

Of course, if there is a different way to implement IE_Navigate so that it redirects the GET response to a file, that would be fine too.

Now, ME_AGAIN has suggested that I use the HTTP_REQUEST function. Ordinarily, I think that would be a great suggestion. But I'm afraid of failure because the page I seek is behind a username/password scheme that redirects to an interim page. Here are the steps required when operating a browser.

Navigate to start
POST
Redirect
GET request to target

HTTP_REQUEST, at least from what I read in the documentation, doesn't seem to support multiple page requests, rather it is a single page request function. So it seems like HTTP_REQUEST could handle logging in, but not the GET request after, because it would not appear to be logged in...whatever cookies or session variables that get set after login aren't mentioned, and I'm doubtful they persist.

All that said, I'm going to try it, because ME_AGAIN is a proven winner when it comes to this System Scheduler stuff, and it would not be smart to just discard this advice.

So, if you have any suggestions about how to stay logged in after an HTTP_REQUEST, I'm all ears.

Tx,

George
"A facility for quotation covers the absence of original thought." - Lord Peter Wimsey

Me_again
Automation Wizard
Posts: 1101
Joined: Fri Jan 07, 2005 5:55 pm
Location: Somewhere else on the planet

Post by Me_again » Tue Nov 11, 2008 1:10 am

Well, I did say:
Me_again wrote:Note that not all pages have simple structures such that the text as displayed can be captured.
and this may be one of them :cry:

mightycpa
Automation Wizard
Posts: 343
Joined: Mon Jan 12, 2004 4:07 pm
Location: Vienna, VA

Post by mightycpa » Tue Nov 11, 2008 2:26 pm

Yes you did... I lack the imagination necessary for me to get to that third page using HTTPREQUEST.

I might have to do the old "Send Keys" trick, view the source and save the file.

I'm open to any other suggestions.

Tx,

George
"A facility for quotation covers the absence of original thought." - Lord Peter Wimsey

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Tue Nov 11, 2008 3:15 pm

Use the ExtractTag function!
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts