REX syntax help - why doesn't this work

Post Reply
barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

REX syntax help - why doesn't this work

Post by barry.marcus »

I want to find all standalone substrings that consist of a single lowercase "x", followed by any number of uppercase letters, followed by a space. Here's what I'm trying:

<$srch=" x[A-Z]+ " ">>=x[A-Z]+ ">
<$data="some words xHITWORD more words">
<rex ROW $srch $data>
Token found: [$ret]<br>
</rex>

This does NOT find the substring xHITWORD. What am I doing wrong?
User avatar
jason112
Site Admin
Posts: 347
Joined: Tue Oct 26, 2004 5:35 pm

REX syntax help - why doesn't this work

Post by jason112 »

Use this
<$srch=" x=[A-Z]+ " ">>=x=[A-Z]+ ">

Remember repetition operators apply to as much as they can, not just the previous character. The + after your character class indicates "more than one occurence of 'x followed by [A-Z]'"
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

REX syntax help - why doesn't this work

Post by John »

The repetition operator applies to the previous subexpression, not a single character, you probably want an = after the x.
John Turnbull
Thunderstone Software
User avatar
jason112
Site Admin
Posts: 347
Joined: Tue Oct 26, 2004 5:35 pm

REX syntax help - why doesn't this work

Post by jason112 »

Note that if you don't want to include the spaces in the matched data (but still use them to perform the match), you can use the Previous & Following flags like this:
<$srch=" \P=x=[A-Z]+\F =" ">>=x=[A-Z]+\F =">
barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

REX syntax help - why doesn't this work

Post by barry.marcus »

Thanks for the help. And Jason, thanks also for the suggestion, but in this case the space IS to be part of the matched data.

A general question... It's just not clear to me what delineates a "subexpression" in a regular expression. Is it repetition operators exclusively?
User avatar
jason112
Site Admin
Posts: 347
Joined: Tue Oct 26, 2004 5:35 pm

REX syntax help - why doesn't this work

Post by jason112 »

Correct, the repetition operators define subexpressions. abcdefg= has one subexpression. a=b?c+d*efg= has 5 subexpressions.
barry.marcus
Posts: 288
Joined: Thu Nov 16, 2006 1:05 pm

REX syntax help - why doesn't this work

Post by barry.marcus »

Got it. Thanks.
User avatar
Kai
Site Admin
Posts: 1271
Joined: Tue Apr 25, 2000 1:27 pm

REX syntax help - why doesn't this work

Post by Kai »

Running the command line `rex -x expr' is also useful when figuring out what `expr' really means to REX: it translates the expression to human-readable pseudocode.
Post Reply