Technical support and scripting issues
Moderators: JRL, Dorian (MJT support)
-
travel88
- Newbie
- Posts: 15
- Joined: Sun May 20, 2007 11:42 am
Post
by travel88 » Sun Dec 28, 2008 1:22 pm
Hi,
Can you help me in writing a code preferably VB script, for extracting just the html part from the below text, which may vary but the html pattern is the same.
Code: Select all
: xxx x xx x x x x x x x x x x xxxx xxxxx http://images.google.com/imgres?imgurl=http://jquery.com/demo/thickbox/images/plant4.jpg&imgrefurl=http://jquery.com/demo/thickbox/&usg=__-9nKVuEbmoL4CJiJMB9lzrsES3o=&h=480&w=640&sz=207&hl=en&start=7&um=1&tbnid=eWicC5rSDLtyCM:&tbnh=103&tbnw=137&prev=/images%3Fq%3Dimages%26um%3D1%26hl%3Den%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26sa%3DN)
xxxx xxxxx
xxxxxx
The pattern starts with http:// and ends with the bracket ) .
Thanks for helping
-
travel88
- Newbie
- Posts: 15
- Joined: Sun May 20, 2007 11:42 am
Post
by travel88 » Mon Dec 29, 2008 3:18 pm
Sorry I tried my level best . Cannot get it right. Hope you can help.
Thanks
-
Bob Hansen
- Automation Wizard
- Posts: 2475
- Joined: Tue Sep 24, 2002 3:47 am
- Location: Salem, New Hampshire, US
-
Contact:
Post
by Bob Hansen » Mon Dec 29, 2008 11:10 pm
Provide your script so it can be reviewed and edited.....
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!
-
travel88
- Newbie
- Posts: 15
- Joined: Sun May 20, 2007 11:42 am
Post
by travel88 » Thu Jan 01, 2009 5:17 am
Code: Select all
//A VBScript Function to search a string for a regex pattern
//returns a list of matches separated by semicolons
VBSTART
Function regExSearch(patrn,str)
Set regEx = New RegExp ' Create regular expression.
regEx.Pattern = patrn ' Set pattern.
regEx.IgnoreCase = True ' Make case insensitive. Default=False
Set matches = RegEx.Execute(str)
List = ""
For each match in matches
List = List & match.value & ";"
Next
regExSearch = Mid(List,1,Len(List)-1)
End Function
VBEND
//Read the file contents into a variable
ReadFile>C:\MSG1.txt,FileData
//replace CRLF chars with VBScript equivalents
StringReplace>FileData,CR," & vbCR & ",FileData
StringReplace>FileData,LF," & vbLF & ",FileData
//Double quote any quotes for VBScript
StringReplace>FileData,","",FileData
//Perform the regex search
VBEval>regExSearch("REGEX_PATTERN","%FileData%"),URLList
//We now have a semicolon delimited list of URLs. We could explode this into an array:
Separate>URLList,;,URLS
If>URLS_COUNT>0
Let>k=1
Repeat>k
Let>ThisURL=URLS_%k%
MessageModal>ThisURL
//we could write it to a file:
WriteLn>C:\result.txt,result,ThisURL
Let>k=k+1
Until>k,URLS_COUNT
Endif
Gets VBscript runtime error, always.
Thanks
-
travel88
- Newbie
- Posts: 15
- Joined: Sun May 20, 2007 11:42 am
Post
by travel88 » Fri Jan 09, 2009 11:56 pm
Any help here ?
Thanks
-
Me_again
- Automation Wizard
- Posts: 1101
- Joined: Fri Jan 07, 2005 5:55 pm
- Location: Somewhere else on the planet
Post
by Me_again » Sat Jan 10, 2009 12:14 am
VBEval>regExSearch("REGEX_PATTERN","%FileData%"),URLList
I believe it's choking on the VBEval because you have not defined "REGEX_PATTERN".
-
travel88
- Newbie
- Posts: 15
- Joined: Sun May 20, 2007 11:42 am
Post
by travel88 » Sat Jan 10, 2009 12:24 am
I need help in defining the pattern.
The URL from the above example has to be extracted starting from http:// and ends at closed bracket ")" .
Thanks very much for helping me in this , as I have been scratching my head for long ......

-
travel88
- Newbie
- Posts: 15
- Joined: Sun May 20, 2007 11:42 am
Post
by travel88 » Thu Jan 15, 2009 5:22 pm
Any help ?
-
Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
-
Contact:
Post
by Marcus Tettmar » Thu Jan 15, 2009 5:35 pm
Detecting URLs in a block of text can never be 100% reliable. Regex Guru Jan Goyvaerts has a
good post on this here.
But we can use Jan's suggested regex:
Code: Select all
VBSTART
Function regExSearch(patrn,str)
Set regEx = New RegExp ' Create regular expression.
regEx.Pattern = patrn ' Set pattern.
regEx.IgnoreCase = True ' Make case insensitive. Default=False
Set matches = RegEx.Execute(str)
List = ""
For each match in matches
List = List & match.value & ";"
Next
regExSearch = Mid(List,1,Len(List)-1)
End Function
VBEND
Let>FileData=: xxx x xx x x x x x x x x x x xxxx xxxxx http://images.google.com/imgres?imgurl=http://jquery.com/demo/thickbox/images/plant4.jpg&imgrefurl=http://jquery.com/demo/thickbox/&usg=__-9nKVuEbmoL4CJiJMB9lzrsES3o=&h=480&w=640&sz=207&hl=en&start=7&um=1&tbnid=eWicC5rSDLtyCM:&tbnh=103&tbnw=137&prev=/images%3Fq%3Dimages%26um%3D1%26hl%3Den%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26sa%3DN) xxxx
//for VBScript double quote quotes and replace hard line breaks with vbCRLF
StringReplace>FileData,","",FileData
StringReplace>FileData,CRLF," & vbCRLF & ",FileData
VBEval>regExSearch("\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[A-Z0-9+&@#/%=~_|]","%FileData%"),URL
MessageModal>URL
-
travel88
- Newbie
- Posts: 15
- Joined: Sun May 20, 2007 11:42 am
Post
by travel88 » Sat Jan 17, 2009 3:29 am
Thankyou. It works exactly.

-
jpuziano
- Automation Wizard
- Posts: 1085
- Joined: Sat Oct 30, 2004 12:00 am
Post
by jpuziano » Sat Jan 17, 2009 5:16 am
Hi Marcus,
Thanks for the code example above and the link... Jan is indeed a Regex guru and makes great products... I even admire the way his website/s are laid out... especially the
Version History pages that give a good level of detail about every Improvement, Bug Fix or New Feature.
FYI - the code block in your post is not showing a vertical scroll bar so I can't scroll to see all the code... I have to CTRL-A copy all the lines and paste them elsewhere just to see them. Horizontal scrollbar is there, just not the vertical one.
Browser here is IE7 on XP SP3 with all the latest Windows updates, up-to-date as of 10 minutes ago (a bunch of security updates were installed).
Thanks again and take care
Last edited by
jpuziano on Fri Feb 20, 2009 5:50 am, edited 3 times in total.
-
jpuziano
- Automation Wizard
- Posts: 1085
- Joined: Sat Oct 30, 2004 12:00 am
Post
by jpuziano » Sat Jan 17, 2009 5:40 am
Hi
travel88,
When you said:
- The pattern starts with http:// and ends with the bracket ) .
Did you mean that the
) is actually part of the URL?
I tried the above code and for me, it clips the
) off the end when it returns the URL... however that may not be important to you because your URL seems to bring up the same web page whether you include the ) at the end or leave it off... at least that's what it did for me.
Take care