HTTPRequest returns "403 HTTP/1.1 403 Forbidden"

Technical support and scripting issues

Moderators: JRL, Dorian (MJT support)

Post Reply
camisade
Newbie
Posts: 7
Joined: Thu Aug 28, 2003 3:20 pm
Contact:

HTTPRequest returns "403 HTTP/1.1 403 Forbidden"

Post by camisade » Sat Jan 01, 2011 5:59 pm

I'm trying to use the HTTPRequest command. The MJT sample code for this function (which sends a query to Google) runs fine from my workstation, as does the use of this function to pull a basic html page from the web server I'm trying to hit.

However, when I use HTTPRequest to hit a PHP page on that same web server, which works fine in a browser, Macro Scheduler returns "403 HTTP/1.1 403 Forbidden" as the response from the web server.

This is a straight http request, no SSL, no authentication, no redirects to follow, etc.

So, why would Macro Schedule be seen as different from any other user agent, and why would the server's scripting engine be refusing Macro Scheduler when that same URL is served just fine to any browser?

Me_again
Automation Wizard
Posts: 1101
Joined: Fri Jan 07, 2005 5:55 pm
Location: Somewhere else on the planet

Post by Me_again » Sun Jan 02, 2011 5:38 pm

Can you post the URL of the page? (But please don't if it's an "adult" or other potentially offensive site.)

HTTPRequest can grab the html code generated by .php pages, but if it's a page with copyrighted or other content that the site owner wants to block from being automatically harvested, as you are trying to do, then there are many ways they can do that.

camisade
Newbie
Posts: 7
Joined: Thu Aug 28, 2003 3:20 pm
Contact:

Post by camisade » Tue Jan 04, 2011 12:30 am

You can see this if you try to hit

http://www.ebookexchange.com/index.php

I'm a principal on that site, and nothing is being (purposefully) done to block any kind of requests, nor is there any PHP configuration setting that I'm aware of that would cause this.

Me_again
Automation Wizard
Posts: 1101
Joined: Fri Jan 07, 2005 5:55 pm
Location: Somewhere else on the planet

Post by Me_again » Tue Jan 04, 2011 1:43 am

Thanks for the link. It appears that the server (Apache) is blocking the "Mozilla/3.0 (compatible; Indy Library)" user agent that MacroScheduler's HTTPRequest is sending. Most likely that block is in the .htaccess file and designed to stop the page being accessed by very old browsers (IE8 is Mozilla/4.0 and Firefix is Mozilla/5.0).

I tested this by downloading the page using wget, it works fine unless I try it with the user agent set to that same "Mozilla/3.0 (compatible; Indy Library)" in which case I get the same 403 error. (Note that I'm using MacroScheduler version 9, and assuming the user agent has not changed.)

I suggest automating wget for windows instead, it has a lot (maybe too many) more options and works well with MacroScheduler.

camisade
Newbie
Posts: 7
Joined: Thu Aug 28, 2003 3:20 pm
Contact:

Post by camisade » Tue Jan 04, 2011 9:12 pm

So, if it's an Apache setting, why would it serve an html page from my web site, but return that "Forbidden" message when I try to fetch a PHP page? That makes it seem more like a PHP config setting, but I can find none that correspond to this kind of thing.

I'll have to look at wget -- but it's important to me that Macro Scheduler not have to open a new window or change the window focus on my desktop when running this script (which needs to happen periodically, automatically). Not sure if MS can do that with an external program.

Me_again
Automation Wizard
Posts: 1101
Joined: Fri Jan 07, 2005 5:55 pm
Location: Somewhere else on the planet

Post by Me_again » Wed Jan 05, 2011 3:22 am

It's strange. It's possible the .php could have a redirect but of course I can't see the code. If you can see your .htaccess file take a look in there, or if you are running any 'bot blocker that would be another place to look.

Anyway, yes, MacroScheduler can run a program "hidden", here's your example:

Code: Select all

Let>RP_WINDOWMODE=0
//Run hidden
IfFileExists>c:\macros\macdat\index.php
DeleteFile>c:\macros\macdat\index.php
EndIf
Run Program>c:\getw\wget.exe -t1 -T30 http://www.ebookexchange.com/index.php -P../macros/macdat/
// -t1 = try once, -T30 = 30 second timeout, -P../macros/macdat/ = store output in c:\macros\macdat

camisade
Newbie
Posts: 7
Joined: Thu Aug 28, 2003 3:20 pm
Contact:

Post by camisade » Fri Jan 07, 2011 4:12 pm

First, let me thank you very much for the help you've provided.

I have wget, and I can run it from the command line, and it works fine and does indeed work to get web pages from my site. It works without having to be run from the active directory, so it's finding the DLLs it needs.

Unfortunately, I can't get it to work from within MacroScheduler 12! I'm using:

RunProgram>C:\Program Files (x86)\GnuWin32\bin\wget.exe -t1 -T30 -o log.txt http://www.ebookexchange.com/index.php

MacroScheduler does indeed seem to be running wget (I'm trialing it in windowed mode, but the window flashes and is gone too fast to read the contents), but when run in this mode no logfile nor fetched file is getting written to disk anywhere.

It doesn't seem likely that MS is closing/killing the wget process so fast the files aren't getting written, but I don't know what else the problem could be.

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Jan 07, 2011 4:24 pm

You have spaces in the path. Have you tried:

RunProgram>"C:\Program Files (x86)\GnuWin32\bin\wget.exe" -t1 -T30 -o log.txt http://www.ebookexchange.com/index.php
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Jan 07, 2011 4:31 pm

BTW - this works for me without having to use any third party files/apps and without having to save to file:

Code: Select all

VBSTART
Function HTTPGet(URL)
  Set IE = CreateObject("InternetExplorer.Application")
  IE.visible = 0
  IE.navigate URL
  do while IE.Busy
  loop
  HTTPGet = IE.document.documentelement.outerhtml
  IE.quit
  Set IE = Nothing
End Function
VBEND
VBEval>HTTPGet("http://www.ebookexchange.com/index.php"),HTMLResult
MessageModal>HTMLResult
I have also noted the issue with the user agent string. Web sites shouldn't really filter based on user agent, but never-the-less I'll see if we can change the user agent and/or make it over-ridable in a future release.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Me_again
Automation Wizard
Posts: 1101
Joined: Fri Jan 07, 2005 5:55 pm
Location: Somewhere else on the planet

Post by Me_again » Fri Jan 07, 2011 5:43 pm

Aha, never fails, whenever I post a workaround Marcus finds a better way :lol:

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Jan 07, 2011 5:49 pm

I took that from the Scripts and Tips forum.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

camisade
Newbie
Posts: 7
Joined: Thu Aug 28, 2003 3:20 pm
Contact:

Post by camisade » Fri Jan 07, 2011 6:24 pm

Thanks SO much. The VB process worked for me. Clearly I need to haunt the Scripts and Tips forum a little more.

FWIW:
Even using old DOS8.3 file names (no spaces), or using quotes, the script would execute and wget would exit without any files being written. I did not try introducing a wait period before exiting, to see if a process was simply getting killed before completing, since that seemed unlikely and I decided to go with the VB-based, no-third-party implementation. It'd still be interesting to understand why I couldn't call wget from MS, since wget has some great options for finely controlling more complex http operations.

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Jan 07, 2011 6:35 pm

Two things I notice with your wget code:

1) you haven't specified a path for the log file. It may not be able to write it - probably doesn't know where - perhaps trying to write to CWD which may not be writable.

2) You are not telling it to wait for the process to complete.

The following works fine for me:

Code: Select all

Let>RP_WAIT=1
RunProgram>"C:\Program Files\GnuWin32\bin\wget.exe" -t1 -T30 -o "%SCRIPT_DIR%\log.txt"  -O "%SCRIPT_DIR%\output.html" http://www.ebookexchange.com/index.php
ReadFile>%SCRIPT_DIR%\output.html,htmlsource
MessageModal>htmlsource
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Jan 07, 2011 6:36 pm

Add the following at the top to make it all run hidden in the background:

Let>RP_WINDOWMODE=0
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Me_again
Automation Wizard
Posts: 1101
Joined: Fri Jan 07, 2005 5:55 pm
Location: Somewhere else on the planet

Post by Me_again » Fri Jan 07, 2011 7:38 pm

That's why I install wget in a non Program Files directory, it wants to write in its home directory, and maybe create subdirectories, so it's more convenient for quick tests from the CL to have it somewhere it can do that.

My code included the instruction where to write:

-P../macros/macdat/ = store output in c:\macros\macdat

The options are somewhat obscure...

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts