problems with Exclude by Field & Extra Domain

Post Reply
renloe
Posts: 35
Joined: Mon Jan 31, 2005 12:51 pm

problems with Exclude by Field & Extra Domain

Post by renloe »

Our install has an extra domain defined (new.our.com), and it is basically a single index page with lots of 1 page links (none of the pages are interconnected via links).

The problem is that webinator does not refresh the extra domains index page, and read the new links that are added (almost daily).

I have tried to add the index page to the "Exclude by Field" section of the setup with the following settings:
Query: http://new.our.com/
Meta:
Field: URL
Exclude: Pages Only

This seems to do nothing, please advise.
Thank You,

--Robert
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

problems with Exclude by Field & Extra Domain

Post by mark »

"Extra Domains" isn't a page or url. It's a domain name. It doesn't cause any fetching. It allows fetching of urls with matching domains. Please see the docs.

Your index page with the links should be a base url or a url url or a page url.
renloe
Posts: 35
Joined: Mon Jan 31, 2005 12:51 pm

problems with Exclude by Field & Extra Domain

Post by renloe »

Thats fine that it allows fetching of urls with that domain.
Once the root page on new.our.com is in the index (which it is), shouldn't webinator be reading the root page on every refresh, and seeing that it is updated with new links, and be following those new links, and adding those into the index as well?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

problems with Exclude by Field & Extra Domain

Post by John »

You are excluding all the Pages in the http://new.our.com/ with the Exclude by Field. It would only index links to other domains that are otherwise allowed.
John Turnbull
Thunderstone Software
renloe
Posts: 35
Joined: Mon Jan 31, 2005 12:51 pm

problems with Exclude by Field & Extra Domain

Post by renloe »

I have removed the new.our.com from the Exclude by Field in the configuration.
I have also added http://new.our.com/ to the Base URL's.

New links have been added to the list on new.our.com, but the webinator refresh walks just do not pick up the new links. What is wrong?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

problems with Exclude by Field & Extra Domain

Post by mark »

Are the pages with the new links due for refresh? Lookup one of the urls in question using list/edit urls to see when it was last fetched and when it's due for refresh. You can adjust the refresh settings near the bottom of the all walk settings page.
renloe
Posts: 35
Joined: Mon Jan 31, 2005 12:51 pm

problems with Exclude by Field & Extra Domain

Post by renloe »

Our refresh walks are set to hourly.
The due for refresh date was too high. I have turned the max refresh time way down (6 hours), and this seems to have helped. Thanks!

So with the "Exclude by Field" settings, you can't just exclude a single url, it excludes any url that matches it at all? Why have this option there when you could add the url to the "Exclusion Prefix" settings?
How would you exclude just a single url then?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

problems with Exclude by Field & Extra Domain

Post by mark »

You could use exclusion rex with >>= at the begin and end of the url to match an exact url. >>=http://something/something/=>>=

Exclude by field is something else. It's intended to exclude based on content or such and provides the option to exclude just the page text, just the links, or both whereas exclusions are only on url and always exclude everything about that page without even downloading it.
Post Reply