Capture from RegEx Match

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
obfusc88
Pro Scripter
Posts: 85
Joined: Wed Mar 14, 2007 6:22 pm

Capture from RegEx Match

Post by obfusc88 » Sun Apr 11, 2021 8:30 pm

I am scraping a 10000+ line file to pull out multiple values to match with each person.
Not all values exist, so I know some Matches will fail.
That file is my Haystack

Here is my command to find the ID values (just one of about 20 different values I need):
RegEx><p class="person-id">ID: [0-9]*</p>,%vHaystack%,0,vID,vIDCount,0

This is what is returned:
<p class="person-id">ID: 390783</p>
That is fine, works good.
But I would like to only return the "390783" from the file, not the entire matched string.

I have not been able to figure out the syntax to capture only the numbers of the Match. I have checked multiple syntax sources, related to PCRE, but none of them seem to work for me.
Some capture values are (....), (?=....)
Example: RegEx><p class="person-id">ID: ([0-9]*)</p>,%vHaystack%,0,vID,vIDCount,0
Example: RegEx><p class="person-id">ID: (?=[0-9]*)</p>,%vHaystack%,0,vID,vIDCount,0

Some values are for non-capture, like (?:...), (?<=......) that I also tried.
Example: RegEx>(?<p class="person-id">ID: )[0-9]*(?</p>),%vHaystack%,0,vID,vIDCount,0
Example: RegEx>(?<=<p class="person-id">ID: )[0-9]*(?<=</p>),%vHaystack%,0,vID,vIDCount,0

I have used all four versions of (ms): (ms) (-ms) (m-s) (-m-s)
I am sure there is a simple solution, but I cannot find it.

This is a simple example, I will probably end up using secondary operations like RegEx Replace tools on the Match, or a more complex use of Position, Left, Right, etc. to parse out what I need. But many of the values I am looking are much more complex.

Can someone provide the correct syntax for the Macro Scheduler PCRE ReGex for capturing just a defined segment of the match?

User avatar
Marcus Tettmar
Site Admin
Posts: 7376
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Re: Capture from RegEx Match

Post by Marcus Tettmar » Mon Apr 12, 2021 12:25 pm

Hi,

You want to do this:

https://help.mjtnet.com/article/12-my-most-used-regex

I guess you might also need to escape the <, > characters with a \
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

obfusc88
Pro Scripter
Posts: 85
Joined: Wed Mar 14, 2007 6:22 pm

Re: Capture from RegEx Match

Post by obfusc88 » Mon Apr 12, 2021 4:12 pm

That was the answer, Marcus. Thank you. Did not need to escape any characters.
Here is my final code:

Code: Select all

RegEx>(?<=<p class="person-id">ID: )[0-9]*?(?=</p>),%vHaystack2%,0,vID,vIDCount,0
The original line in the file was: <p class="person-id">ID: 390783</p>
And the RegEx above returned "390783" (without the quotes).

So, the generic syntax is (?<=....)captured_chars?(?=....) that eliminates inside the tokens,and returns the characters between the tokens.

RegEx>(?<=<p class="person-id">ID: )[0-9]*?(?=</p>),%vHaystack2%,0,vID,vIDCount,0
===================================
It would be great to include a sample like this somewhere in the RegEx documentation. I could not find this shown in any PCRE documents that I scoured. Also needed is how to make groups that can be used in RegEx Replacements.

Thanks again, great support as usual.

User avatar
Marcus Tettmar
Site Admin
Posts: 7376
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Re: Capture from RegEx Match

Post by Marcus Tettmar » Mon Apr 19, 2021 10:00 pm

This is a very commonly needed pattern so it is also documented here:
https://help.mjtnet.com/article/12-my-most-used-regex

However, will consider adding a link to this from the RegEx manual entry.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts