Exclusion REX Question

velevi
Posts: 42
Joined: Thu Sep 08, 2005 12:21 pm

Exclusion REX Question

Post by velevi »

Dear Support Staff:

Why won't the following REX exlusion rule work (even when running a "New"-type walk):
folder1/folder2/[\alpha]+_print\.html

The pages matching this URL pattern are still indexed.

Same holds for:
var=Definitions
where I'd like to exclude a page URL containing this specific query variable from being indexed.

Am I missing something about Exclusion REX?
(Maybe I should mention that I have the following in Extra URLs REX: ^http://my\.site\.com/.+>>\.php\? since if I just had .php in the list of allowed extensions, PHP pages with URL query strings would not get indexed. ?Is that a known bug or is that how it's supposed to be?)

ALSO the normal Exclusions field is not functioning for some reason:
/folder/folder2/ won't exclude the pages with that path from the database (?!).

Thank you!
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Exclusion REX Question

Post by mark »

Try
>>folder1/folder2/=[\alpha]+_print\.html
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Exclusion REX Question

Post by mark »

There's an option to strip query strings. Make sure it's off and that ? is not in your excludes (it is in there by default).

Make sure you're doing a new (not refresh) walk so that it's actually thinking about visiting those pages.
velevi
Posts: 42
Joined: Thu Sep 08, 2005 12:21 pm

Exclusion REX Question

Post by velevi »

OK, I removed the "?" from the Exclusions field. I have "Strip Queries" turned off. (So, this and the former fields do the same thing basically?)

I am definitely doing a "New" walk.

I'll try the change in the REX that you suggested!
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Exclusion REX Question

Post by mark »

Not the same at all really. Exclusions means "if this appears in a url just skip that page entirely". Strip queries means "remove the query string from the url before fetching the page". Both have to be set appropriately for urls with query strings to be included in the walk as-is.
velevi
Posts: 42
Joined: Thu Sep 08, 2005 12:21 pm

Exclusion REX Question

Post by velevi »

The correction of the REX didn't work, the same pages are included, despite the pattern being in the "Exclusion REX" field.

Any other suggestions?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Exclusion REX Question

Post by mark »

Show me your exact url and exact exclusions.
Do you have anything in "Extra URLs REX"? Show that too.
velevi
Posts: 42
Joined: Thu Sep 08, 2005 12:21 pm

Exclusion REX Question

Post by velevi »

User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Exclusion REX Question

Post by mark »

Exclusion rex 1 is not needed. It only walks sites/domains you tell it to.

Try these for the other 2 exclusion rexes:
>>\.php\?=!stat\=Definitions*stat\=Definitions
>>/folderhtml/html/=[\alpha]+_print\.html

For extra urls rex:
>>=http://anothersitehost\.anothersite\.com=/?>>=

Note that rex syntax is different than grep. Please see the rex docs.
velevi
Posts: 42
Joined: Thu Sep 08, 2005 12:21 pm

Exclusion REX Question

Post by velevi »

Thank you the Regular Expressions finally worked. I know that 'grep' syntax is different than Texis rex, but still things like '^' and '$' are supposed to be in rex as well right? Well, it seems that one would always need the anchor '>>' in rex or ?? How come there are two anchors in your suggestion for Extra URLs?

Thank you again!
--------------------------------------------------
URL to exclude 1:
http://sitehost.com/folderphp/sites.php ... efinitions
URL to exclude 2:
http://sitehost.com/folderhtml/html/amyly_print.html

Exclusion REX field:
^http://another\.site\.domain\.com/.*
.*\.php\?=.+stat\=Definitions
.*/folderhtml/html/[\alpha]+_print\.html

Extra URLs Field:
^http://anothersitehost\.anothersite\.com/$

- Exclusion rex 1 is not needed. It only walks sites/domains you tell it to.
- Try these for the other 2 exclusion rexes:
>>\.php\?=!stat\=Definitions*stat\=Definitions
>>/folderhtml/html/=[\alpha]+_print\.html
- For extra urls rex:
>>=http://anothersitehost\.anothersite\.com=/?>>=
Post Reply