HTTPRequest returns "403 HTTP/1.1 403 Forbidden"
Moderators: JRL, Dorian (MJT support)
HTTPRequest returns "403 HTTP/1.1 403 Forbidden"
I'm trying to use the HTTPRequest command. The MJT sample code for this function (which sends a query to Google) runs fine from my workstation, as does the use of this function to pull a basic html page from the web server I'm trying to hit.
However, when I use HTTPRequest to hit a PHP page on that same web server, which works fine in a browser, Macro Scheduler returns "403 HTTP/1.1 403 Forbidden" as the response from the web server.
This is a straight http request, no SSL, no authentication, no redirects to follow, etc.
So, why would Macro Schedule be seen as different from any other user agent, and why would the server's scripting engine be refusing Macro Scheduler when that same URL is served just fine to any browser?
However, when I use HTTPRequest to hit a PHP page on that same web server, which works fine in a browser, Macro Scheduler returns "403 HTTP/1.1 403 Forbidden" as the response from the web server.
This is a straight http request, no SSL, no authentication, no redirects to follow, etc.
So, why would Macro Schedule be seen as different from any other user agent, and why would the server's scripting engine be refusing Macro Scheduler when that same URL is served just fine to any browser?
-
- Automation Wizard
- Posts: 1101
- Joined: Fri Jan 07, 2005 5:55 pm
- Location: Somewhere else on the planet
Can you post the URL of the page? (But please don't if it's an "adult" or other potentially offensive site.)
HTTPRequest can grab the html code generated by .php pages, but if it's a page with copyrighted or other content that the site owner wants to block from being automatically harvested, as you are trying to do, then there are many ways they can do that.
HTTPRequest can grab the html code generated by .php pages, but if it's a page with copyrighted or other content that the site owner wants to block from being automatically harvested, as you are trying to do, then there are many ways they can do that.
You can see this if you try to hit
http://www.ebookexchange.com/index.php
I'm a principal on that site, and nothing is being (purposefully) done to block any kind of requests, nor is there any PHP configuration setting that I'm aware of that would cause this.
http://www.ebookexchange.com/index.php
I'm a principal on that site, and nothing is being (purposefully) done to block any kind of requests, nor is there any PHP configuration setting that I'm aware of that would cause this.
-
- Automation Wizard
- Posts: 1101
- Joined: Fri Jan 07, 2005 5:55 pm
- Location: Somewhere else on the planet
Thanks for the link. It appears that the server (Apache) is blocking the "Mozilla/3.0 (compatible; Indy Library)" user agent that MacroScheduler's HTTPRequest is sending. Most likely that block is in the .htaccess file and designed to stop the page being accessed by very old browsers (IE8 is Mozilla/4.0 and Firefix is Mozilla/5.0).
I tested this by downloading the page using wget, it works fine unless I try it with the user agent set to that same "Mozilla/3.0 (compatible; Indy Library)" in which case I get the same 403 error. (Note that I'm using MacroScheduler version 9, and assuming the user agent has not changed.)
I suggest automating wget for windows instead, it has a lot (maybe too many) more options and works well with MacroScheduler.
I tested this by downloading the page using wget, it works fine unless I try it with the user agent set to that same "Mozilla/3.0 (compatible; Indy Library)" in which case I get the same 403 error. (Note that I'm using MacroScheduler version 9, and assuming the user agent has not changed.)
I suggest automating wget for windows instead, it has a lot (maybe too many) more options and works well with MacroScheduler.
So, if it's an Apache setting, why would it serve an html page from my web site, but return that "Forbidden" message when I try to fetch a PHP page? That makes it seem more like a PHP config setting, but I can find none that correspond to this kind of thing.
I'll have to look at wget -- but it's important to me that Macro Scheduler not have to open a new window or change the window focus on my desktop when running this script (which needs to happen periodically, automatically). Not sure if MS can do that with an external program.
I'll have to look at wget -- but it's important to me that Macro Scheduler not have to open a new window or change the window focus on my desktop when running this script (which needs to happen periodically, automatically). Not sure if MS can do that with an external program.
-
- Automation Wizard
- Posts: 1101
- Joined: Fri Jan 07, 2005 5:55 pm
- Location: Somewhere else on the planet
It's strange. It's possible the .php could have a redirect but of course I can't see the code. If you can see your .htaccess file take a look in there, or if you are running any 'bot blocker that would be another place to look.
Anyway, yes, MacroScheduler can run a program "hidden", here's your example:
Anyway, yes, MacroScheduler can run a program "hidden", here's your example:
Code: Select all
Let>RP_WINDOWMODE=0
//Run hidden
IfFileExists>c:\macros\macdat\index.php
DeleteFile>c:\macros\macdat\index.php
EndIf
Run Program>c:\getw\wget.exe -t1 -T30 http://www.ebookexchange.com/index.php -P../macros/macdat/
// -t1 = try once, -T30 = 30 second timeout, -P../macros/macdat/ = store output in c:\macros\macdat
First, let me thank you very much for the help you've provided.
I have wget, and I can run it from the command line, and it works fine and does indeed work to get web pages from my site. It works without having to be run from the active directory, so it's finding the DLLs it needs.
Unfortunately, I can't get it to work from within MacroScheduler 12! I'm using:
RunProgram>C:\Program Files (x86)\GnuWin32\bin\wget.exe -t1 -T30 -o log.txt http://www.ebookexchange.com/index.php
MacroScheduler does indeed seem to be running wget (I'm trialing it in windowed mode, but the window flashes and is gone too fast to read the contents), but when run in this mode no logfile nor fetched file is getting written to disk anywhere.
It doesn't seem likely that MS is closing/killing the wget process so fast the files aren't getting written, but I don't know what else the problem could be.
I have wget, and I can run it from the command line, and it works fine and does indeed work to get web pages from my site. It works without having to be run from the active directory, so it's finding the DLLs it needs.
Unfortunately, I can't get it to work from within MacroScheduler 12! I'm using:
RunProgram>C:\Program Files (x86)\GnuWin32\bin\wget.exe -t1 -T30 -o log.txt http://www.ebookexchange.com/index.php
MacroScheduler does indeed seem to be running wget (I'm trialing it in windowed mode, but the window flashes and is gone too fast to read the contents), but when run in this mode no logfile nor fetched file is getting written to disk anywhere.
It doesn't seem likely that MS is closing/killing the wget process so fast the files aren't getting written, but I don't know what else the problem could be.
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
You have spaces in the path. Have you tried:
RunProgram>"C:\Program Files (x86)\GnuWin32\bin\wget.exe" -t1 -T30 -o log.txt http://www.ebookexchange.com/index.php
RunProgram>"C:\Program Files (x86)\GnuWin32\bin\wget.exe" -t1 -T30 -o log.txt http://www.ebookexchange.com/index.php
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
BTW - this works for me without having to use any third party files/apps and without having to save to file:
I have also noted the issue with the user agent string. Web sites shouldn't really filter based on user agent, but never-the-less I'll see if we can change the user agent and/or make it over-ridable in a future release.
Code: Select all
VBSTART
Function HTTPGet(URL)
Set IE = CreateObject("InternetExplorer.Application")
IE.visible = 0
IE.navigate URL
do while IE.Busy
loop
HTTPGet = IE.document.documentelement.outerhtml
IE.quit
Set IE = Nothing
End Function
VBEND
VBEval>HTTPGet("http://www.ebookexchange.com/index.php"),HTMLResult
MessageModal>HTMLResult
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
I took that from the Scripts and Tips forum.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
Thanks SO much. The VB process worked for me. Clearly I need to haunt the Scripts and Tips forum a little more.
FWIW:
Even using old DOS8.3 file names (no spaces), or using quotes, the script would execute and wget would exit without any files being written. I did not try introducing a wait period before exiting, to see if a process was simply getting killed before completing, since that seemed unlikely and I decided to go with the VB-based, no-third-party implementation. It'd still be interesting to understand why I couldn't call wget from MS, since wget has some great options for finely controlling more complex http operations.
FWIW:
Even using old DOS8.3 file names (no spaces), or using quotes, the script would execute and wget would exit without any files being written. I did not try introducing a wait period before exiting, to see if a process was simply getting killed before completing, since that seemed unlikely and I decided to go with the VB-based, no-third-party implementation. It'd still be interesting to understand why I couldn't call wget from MS, since wget has some great options for finely controlling more complex http operations.
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
Two things I notice with your wget code:
1) you haven't specified a path for the log file. It may not be able to write it - probably doesn't know where - perhaps trying to write to CWD which may not be writable.
2) You are not telling it to wait for the process to complete.
The following works fine for me:
1) you haven't specified a path for the log file. It may not be able to write it - probably doesn't know where - perhaps trying to write to CWD which may not be writable.
2) You are not telling it to wait for the process to complete.
The following works fine for me:
Code: Select all
Let>RP_WAIT=1
RunProgram>"C:\Program Files\GnuWin32\bin\wget.exe" -t1 -T30 -o "%SCRIPT_DIR%\log.txt" -O "%SCRIPT_DIR%\output.html" http://www.ebookexchange.com/index.php
ReadFile>%SCRIPT_DIR%\output.html,htmlsource
MessageModal>htmlsource
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
Add the following at the top to make it all run hidden in the background:
Let>RP_WINDOWMODE=0
Let>RP_WINDOWMODE=0
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
-
- Automation Wizard
- Posts: 1101
- Joined: Fri Jan 07, 2005 5:55 pm
- Location: Somewhere else on the planet
That's why I install wget in a non Program Files directory, it wants to write in its home directory, and maybe create subdirectories, so it's more convenient for quick tests from the CL to have it somewhere it can do that.
My code included the instruction where to write:
-P../macros/macdat/ = store output in c:\macros\macdat
The options are somewhat obscure...
My code included the instruction where to write:
-P../macros/macdat/ = store output in c:\macros\macdat
The options are somewhat obscure...