Using RegEx to search text file and create variable array

Hints, tips and tricks for newbies

Moderators: Dorian (MJT support), JRL

clickfast
Pro Scripter
Posts: 58
Joined: Wed May 23, 2007 12:04 am

Using RegEx to search text file and create variable array

Post by clickfast » Thu May 31, 2007 4:45 pm

Hi, I have what seems to be a simple requirement, but I'm beyond my knowledge in how to script it.

Here's what I'm trying to do:

1. Read all contents of a text file and put it into variable "xmlText"

2. Perform a search within variable "xmlText" using regular expression pattern to filter out only select URLs and put them into an array "urlText" (I already have the regular expression written but don't know how to script the search in Macro Scheduler).

Note: There will by anywhere from 1 to 5 URLs within the "xmlText" variable. Never less than 1 and never more than 5. This is why I was thinking I need to put them into a dynamic array - but again, I'm a beginner so I don't really know how to script this action

3. Next, I need to have the "xmlText" variable written into a text file called URLS.txt with each variable on a separate line (so they don't all run together).

4. Finally, I need to copy URLS.txt into a specific directory and overwrite an existing file of the same name.

I have tried to state this requirement as clear as possible, please let me know if you have more specific questions..

I'd appreciate any help from anybody... Thanks in Advance!

[/list]

User avatar
Marcus Tettmar
Site Admin
Posts: 7380
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Thu May 31, 2007 4:58 pm

Something like this:

Code: Select all

//A VBScript Function to search a string for a regex pattern
//returns a list of matches separated by semicolons
VBSTART
Function regExSearch(patrn,str)
  Set regEx = New RegExp ' Create regular expression.
  regEx.Pattern = patrn ' Set pattern.
  regEx.IgnoreCase = True ' Make case insensitive. Default=False
  Set matches = RegEx.Execute(str)
  List = ""
  For each match in matches
  	 List = List & match.value & ";"
  Next
  regExSearch = Mid(List,1,Len(List)-1)
End Function
VBEND

//Read the file contents into a variable
ReadFile>YourFile.txt,FileData

//replace CRLF chars with VBScript equivalents
StringReplace>FileData,CR," & vbCR & ",FileData
StringReplace>FileData,LF," & vbLF & ",FileData
//Double quote any quotes for VBScript
StringReplace>FileData,","",FileData

//Perform the regex search
VBEval>regExSearch("REGEX_PATTERN","%FileData%"),URLList

//We now have a semicolon delimited list of URLs.  We could explode this into an array:
Separate>URLList,;,URLS
If>URLS_COUNT>0
  Let>k=1
  Repeat>k
    Let>ThisURL=URLS_%k%
    MessageModal>ThisURL
	//we could write it to a file:
	WriteLn>outputfile,result,ThisURL
    Let>k=k+1
  Until>k=URLS_COUNT
Endif
Insert your regular expression in place of REGEX_PATTERN. The VBScript function will return a semicolon delimited list of matches. We can then loop through this list with Separate and Repeat/Until and do whatever you need to do to them.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

clickfast
Pro Scripter
Posts: 58
Joined: Wed May 23, 2007 12:04 am

Post by clickfast » Thu May 31, 2007 5:21 pm

Thanks! I will try this and let you know the results!

clickfast
Pro Scripter
Posts: 58
Joined: Wed May 23, 2007 12:04 am

Post by clickfast » Fri Jun 01, 2007 3:23 am

I created a empty script file in Macro Scheduler and pasted in your code and got the following error when I clicked run:

"Microsoft VBScript runtime error :5

Invalid procedure call or argument: 'Mid'

line 13, Column 2 "



mtettmar wrote:Something like this:

Code: Select all

//A VBScript Function to search a string for a regex pattern
//returns a list of matches separated by semicolons
VBSTART
Function regExSearch(patrn,str)
  Set regEx = New RegExp ' Create regular expression.
  regEx.Pattern = patrn ' Set pattern.
  regEx.IgnoreCase = True ' Make case insensitive. Default=False
  Set matches = RegEx.Execute(str)
  List = ""
  For each match in matches
  	 List = List & match.value & ";"
  Next
  regExSearch = Mid(List,1,Len(List)-1)
End Function
VBEND

//Read the file contents into a variable
ReadFile>YourFile.txt,FileData

//replace CRLF chars with VBScript equivalents
StringReplace>FileData,CR," & vbCR & ",FileData
StringReplace>FileData,LF," & vbLF & ",FileData
//Double quote any quotes for VBScript
StringReplace>FileData,","",FileData

//Perform the regex search
VBEval>regExSearch("REGEX_PATTERN","%FileData%"),URLList

//We now have a semicolon delimited list of URLs.  We could explode this into an array:
Separate>URLList,;,URLS
If>URLS_COUNT>0
  Let>k=1
  Repeat>k
    Let>ThisURL=URLS_%k%
    MessageModal>ThisURL
	//we could write it to a file:
	WriteLn>outputfile,result,ThisURL
    Let>k=k+1
  Until>k=URLS_COUNT
Endif
Insert your regular expression in place of REGEX_PATTERN. The VBScript function will return a semicolon delimited list of matches. We can then loop through this list with Separate and Repeat/Until and do whatever you need to do to them.

User avatar
JRL
Automation Wizard
Posts: 3501
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

Post by JRL » Fri Jun 01, 2007 3:52 am

That's exactly the error I get if I don't supply a data file. Try replacing the "yourfile" in the line: ReadFile>YourFile.txt,FileData with the path and file name that your data resides within. Something like:
ReadFile>c:\URLS.txt,FileData

Does that help?

clickfast
Pro Scripter
Posts: 58
Joined: Wed May 23, 2007 12:04 am

Post by clickfast » Fri Jun 01, 2007 4:19 am

I've specified the path to my file d:\mytextfile.xml and it still gives the error. Does it matter that it's an xml file versus text file?

JRL wrote:That's exactly the error I get if I don't supply a data file. Try replacing the "yourfile" in the line: ReadFile>YourFile.txt,FileData with the path and file name that your data resides within. Something like:
ReadFile>c:\URLS.txt,FileData

Does that help?

Aaron
Pro Scripter
Posts: 113
Joined: Mon Apr 09, 2007 1:35 am
Location: Wyoming

replace line 37

Post by Aaron » Fri Jun 01, 2007 4:59 am

replace line 37 - WriteLn>outputfile,result,ThisURL

with something like

WriteLn>c:\URLS.txt,result,FileData

Hope this helps
Aaron

clickfast
Pro Scripter
Posts: 58
Joined: Wed May 23, 2007 12:04 am

Re: replace line 37

Post by clickfast » Fri Jun 01, 2007 6:00 am

I still get the following error even after making the suggested change below. And I was carefule to replace c:\urls.txt with my actual path to my file

"Microsoft VBScript runtime error :5

Invalid procedure call or argument: 'Mid'

line 13, Column 2 "

Aaron wrote:replace line 37 - WriteLn>outputfile,result,ThisURL

with something like

WriteLn>c:\URLS.txt,result,FileData

Hope this helps

Aaron
Pro Scripter
Posts: 113
Joined: Mon Apr 09, 2007 1:35 am
Location: Wyoming

suggestion

Post by Aaron » Fri Jun 01, 2007 8:18 am

why dont you post your xml file along with the script.

I will be happy to give it a try on my end.
Aaron

clickfast
Pro Scripter
Posts: 58
Joined: Wed May 23, 2007 12:04 am

Re: suggestion

Post by clickfast » Sat Jun 02, 2007 8:18 am

OK. I had screwed up specifying the source file to read from... my bad...

now it works...

HOWEVER, when we get to the separate part of the code below it seems to be dropping one of the four URLs created in the VBEval "regex" step (full script above).

I know there are 4 Urls cause I did a message> on the "URLlist" Variable just before the Separate function and it showed 4 Urls.

But again, for some reason after it executes the code below it's only writing 3 of the 4 urls to D:\urls.txt


HMMMMMMMmm...... ANY IDEAS????

Code: Select all

//We now have a semicolon delimited list of URLs.  We could explode this into an array:
Separate>URLList,;,URLS

If>URLS_COUNT>0
  Let>k=1
  Repeat>k
    Let>ThisURL=URLS_%k%
	//write it to a file:
	WriteLn>D:\urls.txt,result,ThisURL
    Let>k=k+1
  Until>k=URLS_COUNT
Endif

[/code]

User avatar
JRL
Automation Wizard
Posts: 3501
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

Post by JRL » Sat Jun 02, 2007 11:19 pm

I have an idea. Start with let>k=0 and move the let>k=k+1 line to the start of the repeat loop. As you have it, the 3rd time through the loop you set k=4 and so the loop stops before it has a chance to write the fourth URL.

Code: Select all

//We now have a semicolon delimited list of URLs. We could explode this into an array:
Separate>URLList,;,URLS

If>URLS_COUNT>0
  Let>k=0
  Repeat>k
    Let>k=k+1
    Let>ThisURL=URLS_%k%
	//write it to a file:
	WriteLn>D:\urls.txt,result,ThisURL
  Until>k=URLS_COUNT
Endif

clickfast
Pro Scripter
Posts: 58
Joined: Wed May 23, 2007 12:04 am

Post by clickfast » Sun Jun 03, 2007 3:57 am

This is worth a try... i actually need it to list as many as 5 urlsl. There will never be less than 1 url and never more than 5.

Using your suggestion below however I'm not sure I understand because the Let>k=k+1 is already at the fine line underneath the repeat.

JRL wrote:I have an idea. Start with let>k=0 and move the let>k=k+1 line to the start of the repeat loop. As you have it, the 3rd time through the loop you set k=4 and so the loop stops before it has a chance to write the fourth URL.

Code: Select all

//We now have a semicolon delimited list of URLs. We could explode this into an array:
Separate>URLList,;,URLS

If>URLS_COUNT>0
  Let>k=0
  Repeat>k
    Let>k=k+1
    Let>ThisURL=URLS_%k%
	//write it to a file:
	WriteLn>D:\urls.txt,result,ThisURL
  Until>k=URLS_COUNT
Endif

User avatar
JRL
Automation Wizard
Posts: 3501
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

Post by JRL » Sun Jun 03, 2007 5:33 am

...because the Let>k=k+1 is already at the fine line underneath the repeat.
Yes, but as you had it Let>k=k+1 was at the end of the repeat loop. k would therefore become equal to "URLS_COUNT" one loop too early. As soon as the Until> statement is read and k is equal to URLS_COUNT, the loop ends.

Step through your code in the editor with Let>k=k+1 at the end of the repeat loop and watch what happens. The loop will run through three times (rather than four) and then quit.

On the other hand if Let>k=k+1 is at the start of the loop, k will become equal to URLS_COUNT and then the next lines will execute before the Until> is read and the loop ends. Thus the WriteLn> function will occur the required four times.

Hope this is more clear.

clickfast
Pro Scripter
Posts: 58
Joined: Wed May 23, 2007 12:04 am

Post by clickfast » Sun Jun 03, 2007 7:56 am

Thanks JRL... I figured out what you meant shortly after posting ... however, what if i want it to loop as much as 5 times. As I stated above... the file will always have at least 1 URL and up to 5 URLs - BUT NEVER MORE.

Am I still Okay with your repeat settings?? If not how do I mod to include a possible 5th URL?

THANKS!

JRL wrote:
...

On the other hand if Let>k=k+1 is at the start of the loop, k will become equal to URLS_COUNT and then the next lines will execute before the Until> is read and the loop ends. Thus the WriteLn> function will occur the required four times.

Hope this is more clear.

Aaron
Pro Scripter
Posts: 113
Joined: Mon Apr 09, 2007 1:35 am
Location: Wyoming

Post by Aaron » Mon Jun 04, 2007 4:06 am

what if i want it to loop as much as 5 times
If you had 100 url it would find them all.

Hope this helps
Aaron

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts