RegEx Help

Hints, tips and tricks for newbies

Moderators: Dorian (MJT support), JRL

Post Reply
RNIB
Macro Veteran
Posts: 198
Joined: Thu Jan 10, 2008 10:25 am
Location: London, UK

RegEx Help

Post by RNIB » Tue Nov 26, 2024 11:35 am

I think I may finally have worked out why my macro is being so inconsistent and its to do with the RegEx I'm using but I don't understand RegEx enough (or at all) to work out how to correct it.

The macro I'm failing to create looks at the metadata tags within an HTML file and checks the content of them to see if they conform to the correct standard. To do this, it reads the HTML file one line at a time and checks to see if it's the tag its looking for. If it is, it then extracts the content/value of the tag to determine whether its valid. For example, this is one such tag:
<meta name="dc:identifier" content="UK-RNIB-7C53893B-8598-4961-93F8-F091A6BADFE7"/>
Here the macro looks for dc:intentifier and if it sees it, it saves the content to a string, in this case UK-RNIB-etc

This is the code I'm using:

Code: Select all

ReadLn>%Path%\ncc.html,%dcIdentLine1%,strDCIdentValue1
LTrim>strDCIdentValue1,strDCIdentValue1
Let>dcIdentpattern=dc:identifier.+?content="\K[^"]+
RegEx>dcIdentpattern,strDCIdentValue1,0,dcIdentMatches1,nm5,0
This seems to work most of the time. However, I've found that sometimes the metadata tag can be formed slightly differently, such as:
<meta name="dc:identifier" content="UK-RNIB-7C53893B-8598-4961-93F8-F091A6BADFE7" scheme="RNIB"/>
The additional scheme="RNIB", or perhaps more specifically the additional " ", looks like it could be the cause of the inconsistently because if I remove them manually first it appears to work more reliably, possibly faultlessly (needs more testing)

I don't know why the macro would work fine when stepped through but not when run at normal speed and why the additional " " would be the difference between it working and not working.

Is there a way of modifying the RegEx pattern to be able to cope with both types of tag?

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1399
Joined: Sun Nov 03, 2002 3:19 am

Re: RegEx Help

Post by Dorian (MJT support) » Wed Nov 27, 2024 11:21 am

I'm no Regex expert but I know there are a few in here, but I can't replicate this. I'm getting the intended result using your Regex whichever string I use. So I think this may be a red herring :

Code: Select all

Let>strDCIdentValue1=<meta name="dc:identifier" content="UK-RNIB-7C53893B-8598-4961-93F8-F091A6BADFE7"/>
LTrim>strDCIdentValue1,strDCIdentValue1
Let>dcIdentpattern=dc:identifier.+?content="\K[^"]+
RegEx>dcIdentpattern,strDCIdentValue1,0,dcIdentMatches1,nm5,0
mdl>dcIdentMatches1_1
put>dcIdentMatches1_1
//Result : UK-RNIB-7C53893B-8598-4961-93F8-F091A6BADFE7


Let>strDCIdentValue1=<meta name="dc:identifier" content="UK-RNIB-7C53893B-8598-4961-93F8-F091A6BADFE7" scheme="RNIB"/>
LTrim>strDCIdentValue1,strDCIdentValue1
Let>dcIdentpattern=dc:identifier.+?content="\K[^"]+
RegEx>dcIdentpattern,strDCIdentValue1,0,dcIdentMatches2,nm5,0
mdl>dcIdentMatches2_1
put>dcIdentMatches2_1
//Result : UK-RNIB-7C53893B-8598-4961-93F8-F091A6BADFE7

RNIB
Macro Veteran
Posts: 198
Joined: Thu Jan 10, 2008 10:25 am
Location: London, UK

Re: RegEx Help

Post by RNIB » Wed Nov 27, 2024 3:39 pm

Yeah I think you are right. The weird thing is that it appears that the first time it sees this particular tag, it thinks its missing and adds a new one. When it then starts the 2nd loop, it always correctly reads this tag and so doesn't try to add a new one. The difference was in my test set of files, the 1st html file had the additional scheme part of the tag whilst the remaining ones didn't.

Any ideas why a log file isn't created even though I've selected one and tried using every logging option. The instructions don't say what format the log file should be. I've tried creating a blank text file and saving it with both a .txt and .log extension but nothing gets added.

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1399
Joined: Sun Nov 03, 2002 3:19 am

Re: RegEx Help

Post by Dorian (MJT support) » Wed Nov 27, 2024 3:53 pm

You may have missed my previous response to that. Exit the editor and then run the script. Logs aren't generated when in the editor. The logs will be your key to this.

RNIB
Macro Veteran
Posts: 198
Joined: Thu Jan 10, 2008 10:25 am
Location: London, UK

Re: RegEx Help

Post by RNIB » Thu Nov 28, 2024 2:10 pm

Ahh, okay. Well I've got the logging working at last.

The only problem now is that with logging enabled it runs really, really slow. However, making by making it run so slow, it works as it should and so the log file doesn't show where it's going wrong. Insert emjoi of banging head against a brick wall :lol:

Sigh, back to the drawing board. Think I'm going to have to start again or give up. The latter feels more appealing at the moment :wink:

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1399
Joined: Sun Nov 03, 2002 3:19 am

Re: RegEx Help

Post by Dorian (MJT support) » Thu Nov 28, 2024 2:19 pm

Well, that brings you back to your earlier hypothesis of timing.

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts