Page 1 of 1

REX syntax help - why doesn't this work

Posted: Wed Sep 15, 2010 12:14 pm
by barry.marcus
I want to find all standalone substrings that consist of a single lowercase "x", followed by any number of uppercase letters, followed by a space. Here's what I'm trying:

<$srch=" x[A-Z]+ " ">>=x[A-Z]+ ">
<$data="some words xHITWORD more words">
<rex ROW $srch $data>
Token found: [$ret]<br>
</rex>

This does NOT find the substring xHITWORD. What am I doing wrong?

REX syntax help - why doesn't this work

Posted: Wed Sep 15, 2010 12:47 pm
by jason112
Use this
<$srch=" x=[A-Z]+ " ">>=x=[A-Z]+ ">

Remember repetition operators apply to as much as they can, not just the previous character. The + after your character class indicates "more than one occurence of 'x followed by [A-Z]'"

REX syntax help - why doesn't this work

Posted: Wed Sep 15, 2010 12:48 pm
by John
The repetition operator applies to the previous subexpression, not a single character, you probably want an = after the x.

REX syntax help - why doesn't this work

Posted: Wed Sep 15, 2010 12:48 pm
by jason112
Note that if you don't want to include the spaces in the matched data (but still use them to perform the match), you can use the Previous & Following flags like this:
<$srch=" \P=x=[A-Z]+\F =" ">>=x=[A-Z]+\F =">

REX syntax help - why doesn't this work

Posted: Wed Sep 15, 2010 1:15 pm
by barry.marcus
Thanks for the help. And Jason, thanks also for the suggestion, but in this case the space IS to be part of the matched data.

A general question... It's just not clear to me what delineates a "subexpression" in a regular expression. Is it repetition operators exclusively?

REX syntax help - why doesn't this work

Posted: Wed Sep 15, 2010 1:45 pm
by jason112
Correct, the repetition operators define subexpressions. abcdefg= has one subexpression. a=b?c+d*efg= has 5 subexpressions.

REX syntax help - why doesn't this work

Posted: Wed Sep 15, 2010 1:57 pm
by barry.marcus
Got it. Thanks.

REX syntax help - why doesn't this work

Posted: Thu Sep 16, 2010 10:25 am
by Kai
Running the command line `rex -x expr' is also useful when figuring out what `expr' really means to REX: it translates the expression to human-readable pseudocode.