Page 1 of 1

Indexing query

Posted: Tue Oct 17, 2006 5:02 am
by legedza.henry
Hi there,

I have this site as a base url: www.decs.sa.gov.au/barossadistrict

It only has about 20 or so links on it, some are offsite. The following are not being indexed for the reasons as listed

Unwanted prefix http://www.decs.sa.gov.au/docs/files/co ... _Zones.doc

Unwanted prefix http://www.decs.sa.gov.au/decs_search/pages/locs

Unwanted prefix http://www.decs.sa.gov.au/svpst/pages/parents

Unwanted prefix http://www.decs.sa.gov.au/svpst

Unwanted prefix http://www.decs.sa.gov.au/

I have no prefixes entered, Offsite Pages is set to Y, Stay Under is set to No

I would like it to index the pages above but not the whole of the www.decs.sa.gov.au domain.

Any suggestions as to how I setup Webinator to do this? I have tried all sorts of variations and either get the above result or an index with far more results than I want.

Thanks
Henry

Indexing query

Posted: Tue Oct 17, 2006 1:08 pm
by mark
Not sure why you're getting unwanted prefix if stay under is set to no. But with stay under set to no it is free to walk the entire site. If you have a specific list of files you want perhaps a page file or page url would be the way to go if you don't want to specify them all as base urls with a max depth of 0.

Indexing query

Posted: Tue Oct 17, 2006 7:33 pm
by legedza.henry
Apologies - Stay Under is set to Yes (I was looking at the wrong profile)

If I set it Stay under to No it goes and indexes the entire site which is huge.

I only want it to index those links that emanate from the base url but not the whole of www.decs.sa.gov.au

Indexing query

Posted: Tue Oct 17, 2006 8:58 pm
by John
If you only want the 20 or so links on that page indexed then you may want to set depth 1.

Indexing query

Posted: Tue Oct 17, 2006 9:29 pm
by legedza.henry
Unfortunatelty the 20 or so links aren't on the one page within the base url, they are a couple of levels down.

So if change the depth value to get all of those I also get lots of others from the main www.decs.sa.gov.au site.

Indexing query

Posted: Wed Oct 18, 2006 11:06 am
by mark
Let me see if I follow...
Given base url http://abc/def you want to walk everything having the prefix abc/def and also want to get the individual pages linked by def but aren't under def. For those pages you don't want to follow more links.

You could modify the script slightly to treat the unwanted prefix pages as offsite pages then set off-site pages to y. Where you find <$reason="Unwanted prefix"> change it to <xtree insert $u offsite> .

Indexing query

Posted: Wed Oct 18, 2006 7:54 pm
by legedza.henry
I made the change to the script and the result is now the previously "unwanted prefix" pages are being completely ignored during the index even with Offsite set to Yes

eg all http://abc/def pages being indexed
all completely offsites referenced by abc/def are being indexed

but those sites http://abc or http://abc/lmn or http://abc/rst are not being indexed but also no error is being reported.

Indexing query

Posted: Thu Oct 19, 2006 12:22 pm
by mark
Sorry, $u above should be $olinks .

<xtree insert $olinks offsite>

Indexing query

Posted: Fri Oct 20, 2006 1:51 am
by legedza.henry
Yep, that seems to have doen the trick as well. Thanks.