REX syntax help - why doesn't this work

barry.marcus · Post by **barry.marcus** » Wed Sep 15, 2010 12:14 pm

I want to find all standalone substrings that consist of a single lowercase "x", followed by any number of uppercase letters, followed by a space. Here's what I'm trying:

<$srch=" x[A-Z]+ " ">>=x[A-Z]+ ">
<$data="some words xHITWORD more words">
<rex ROW $srch $data>
Token found: [$ret]<br>
</rex>

This does NOT find the substring xHITWORD. What am I doing wrong?

Post by **jason112** » Wed Sep 15, 2010 12:47 pm

Use this
<$srch=" x=[A-Z]+ " ">>=x=[A-Z]+ ">

Remember repetition operators apply to as much as they can, not just the previous character. The + after your character class indicates "more than one occurence of 'x followed by [A-Z]'"

Post by **John** » Wed Sep 15, 2010 12:48 pm

The repetition operator applies to the previous subexpression, not a single character, you probably want an = after the x.

Post by **jason112** » Wed Sep 15, 2010 12:48 pm

Note that if you don't want to include the spaces in the matched data (but still use them to perform the match), you can use the Previous & Following flags like this:
<$srch=" \P=x=[A-Z]+\F =" ">>=x=[A-Z]+\F =">

barry.marcus · Post by **barry.marcus** » Wed Sep 15, 2010 1:15 pm

Thanks for the help. And Jason, thanks also for the suggestion, but in this case the space IS to be part of the matched data.

A general question... It's just not clear to me what delineates a "subexpression" in a regular expression. Is it repetition operators exclusively?

Post by **jason112** » Wed Sep 15, 2010 1:45 pm

Correct, the repetition operators define subexpressions. abcdefg= has one subexpression. a=b?c+d*efg= has 5 subexpressions.

barry.marcus · Post by **barry.marcus** » Wed Sep 15, 2010 1:57 pm

Got it. Thanks.

Post by **Kai** » Thu Sep 16, 2010 10:25 am

Running the command line `rex -x expr' is also useful when figuring out what `expr' really means to REX: it translates the expression to human-readable pseudocode.