Page 1 of 1

problems with Exclude by Field & Extra Domain

Posted: Thu Mar 30, 2006 6:33 pm
by renloe
Our install has an extra domain defined (new.our.com), and it is basically a single index page with lots of 1 page links (none of the pages are interconnected via links).

The problem is that webinator does not refresh the extra domains index page, and read the new links that are added (almost daily).

I have tried to add the index page to the "Exclude by Field" section of the setup with the following settings:
Query: http://new.our.com/
Meta:
Field: URL
Exclude: Pages Only

This seems to do nothing, please advise.
Thank You,

--Robert

problems with Exclude by Field & Extra Domain

Posted: Thu Mar 30, 2006 10:38 pm
by mark
"Extra Domains" isn't a page or url. It's a domain name. It doesn't cause any fetching. It allows fetching of urls with matching domains. Please see the docs.

Your index page with the links should be a base url or a url url or a page url.

problems with Exclude by Field & Extra Domain

Posted: Fri Mar 31, 2006 2:29 am
by renloe
Thats fine that it allows fetching of urls with that domain.
Once the root page on new.our.com is in the index (which it is), shouldn't webinator be reading the root page on every refresh, and seeing that it is updated with new links, and be following those new links, and adding those into the index as well?

problems with Exclude by Field & Extra Domain

Posted: Mon Apr 03, 2006 4:55 pm
by renloe
I have removed the new.our.com from the Exclude by Field in the configuration.
I have also added http://new.our.com/ to the Base URL's.

New links have been added to the list on new.our.com, but the webinator refresh walks just do not pick up the new links. What is wrong?

problems with Exclude by Field & Extra Domain

Posted: Mon Apr 03, 2006 5:47 pm
by mark
Are the pages with the new links due for refresh? Lookup one of the urls in question using list/edit urls to see when it was last fetched and when it's due for refresh. You can adjust the refresh settings near the bottom of the all walk settings page.

problems with Exclude by Field & Extra Domain

Posted: Wed Apr 05, 2006 5:12 pm
by renloe
Our refresh walks are set to hourly.
The due for refresh date was too high. I have turned the max refresh time way down (6 hours), and this seems to have helped. Thanks!

So with the "Exclude by Field" settings, you can't just exclude a single url, it excludes any url that matches it at all? Why have this option there when you could add the url to the "Exclusion Prefix" settings?
How would you exclude just a single url then?

problems with Exclude by Field & Extra Domain

Posted: Wed Apr 05, 2006 5:56 pm
by mark
You could use exclusion rex with >>= at the begin and end of the url to match an exact url. >>=http://something/something/=>>=

Exclude by field is something else. It's intended to exclude based on content or such and provides the option to exclude just the page text, just the links, or both whereas exclusions are only on url and always exclude everything about that page without even downloading it.