I am scraping a 10000+ line file to pull out multiple values to match with each person.
Not all values exist, so I know some Matches will fail.
That file is my Haystack
Here is my command to find the ID values (just one of about 20 different values I need):
RegEx><p class="person-id">ID: [0-9]*</p>,%vHaystack%,0,vID,vIDCount,0
This is what is returned:
<p class="person-id">ID: 390783</p>
That is fine, works good.
But I would like to only return the "390783" from the file, not the entire matched string.
I have not been able to figure out the syntax to capture only the numbers of the Match. I have checked multiple syntax sources, related to PCRE, but none of them seem to work for me.
Some capture values are (....), (?=....)
Example: RegEx><p class="person-id">ID: ([0-9]*)</p>,%vHaystack%,0,vID,vIDCount,0
Example: RegEx><p class="person-id">ID: (?=[0-9]*)</p>,%vHaystack%,0,vID,vIDCount,0
Some values are for non-capture, like (?:...), (?<=......) that I also tried.
Example: RegEx>(?<p class="person-id">ID: )[0-9]*(?</p>),%vHaystack%,0,vID,vIDCount,0
Example: RegEx>(?<=<p class="person-id">ID: )[0-9]*(?<=</p>),%vHaystack%,0,vID,vIDCount,0
I have used all four versions of (ms): (ms) (-ms) (m-s) (-m-s)
I am sure there is a simple solution, but I cannot find it.
This is a simple example, I will probably end up using secondary operations like RegEx Replace tools on the Match, or a more complex use of Position, Left, Right, etc. to parse out what I need. But many of the values I am looking are much more complex.
Can someone provide the correct syntax for the Macro Scheduler PCRE ReGex for capturing just a defined segment of the match?
Capture from RegEx Match
Moderators: Dorian (MJT support), JRL
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
Re: Capture from RegEx Match
Hi,
You want to do this:
https://help.mjtnet.com/article/12-my-most-used-regex
I guess you might also need to escape the <, > characters with a \
You want to do this:
https://help.mjtnet.com/article/12-my-most-used-regex
I guess you might also need to escape the <, > characters with a \
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
Re: Capture from RegEx Match
That was the answer, Marcus. Thank you. Did not need to escape any characters.
Here is my final code:
The original line in the file was: <p class="person-id">ID: 390783</p>
And the RegEx above returned "390783" (without the quotes).
So, the generic syntax is (?<=....)captured_chars?(?=....) that eliminates inside the tokens,and returns the characters between the tokens.
RegEx>(?<=<p class="person-id">ID: )[0-9]*?(?=</p>),%vHaystack2%,0,vID,vIDCount,0
===================================
It would be great to include a sample like this somewhere in the RegEx documentation. I could not find this shown in any PCRE documents that I scoured. Also needed is how to make groups that can be used in RegEx Replacements.
Thanks again, great support as usual.
Here is my final code:
Code: Select all
RegEx>(?<=<p class="person-id">ID: )[0-9]*?(?=</p>),%vHaystack2%,0,vID,vIDCount,0
And the RegEx above returned "390783" (without the quotes).
So, the generic syntax is (?<=....)captured_chars?(?=....) that eliminates inside the tokens,and returns the characters between the tokens.
RegEx>(?<=<p class="person-id">ID: )[0-9]*?(?=</p>),%vHaystack2%,0,vID,vIDCount,0
===================================
It would be great to include a sample like this somewhere in the RegEx documentation. I could not find this shown in any PCRE documents that I scoured. Also needed is how to make groups that can be used in RegEx Replacements.
Thanks again, great support as usual.
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
Re: Capture from RegEx Match
This is a very commonly needed pattern so it is also documented here:
https://help.mjtnet.com/article/12-my-most-used-regex
However, will consider adding a link to this from the RegEx manual entry.
https://help.mjtnet.com/article/12-my-most-used-regex
However, will consider adding a link to this from the RegEx manual entry.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?