The macro I'm failing to create looks at the metadata tags within an HTML file and checks the content of them to see if they conform to the correct standard. To do this, it reads the HTML file one line at a time and checks to see if it's the tag its looking for. If it is, it then extracts the content/value of the tag to determine whether its valid. For example, this is one such tag:
Here the macro looks for dc:intentifier and if it sees it, it saves the content to a string, in this case UK-RNIB-etc<meta name="dc:identifier" content="UK-RNIB-7C53893B-8598-4961-93F8-F091A6BADFE7"/>
This is the code I'm using:
Code: Select all
ReadLn>%Path%\ncc.html,%dcIdentLine1%,strDCIdentValue1
LTrim>strDCIdentValue1,strDCIdentValue1
Let>dcIdentpattern=dc:identifier.+?content="\K[^"]+
RegEx>dcIdentpattern,strDCIdentValue1,0,dcIdentMatches1,nm5,0
The additional scheme="RNIB", or perhaps more specifically the additional " ", looks like it could be the cause of the inconsistently because if I remove them manually first it appears to work more reliably, possibly faultlessly (needs more testing)<meta name="dc:identifier" content="UK-RNIB-7C53893B-8598-4961-93F8-F091A6BADFE7" scheme="RNIB"/>
I don't know why the macro would work fine when stepped through but not when run at normal speed and why the additional " " would be the difference between it working and not working.
Is there a way of modifying the RegEx pattern to be able to cope with both types of tag?