Indexing query

Post Reply
legedza.henry
Posts: 142
Joined: Wed Jul 24, 2002 11:52 pm

Indexing query

Post by legedza.henry »

Hi there,

I have this site as a base url: www.decs.sa.gov.au/barossadistrict

It only has about 20 or so links on it, some are offsite. The following are not being indexed for the reasons as listed

Unwanted prefix http://www.decs.sa.gov.au/docs/files/co ... _Zones.doc

Unwanted prefix http://www.decs.sa.gov.au/decs_search/pages/locs

Unwanted prefix http://www.decs.sa.gov.au/svpst/pages/parents

Unwanted prefix http://www.decs.sa.gov.au/svpst

Unwanted prefix http://www.decs.sa.gov.au/

I have no prefixes entered, Offsite Pages is set to Y, Stay Under is set to No

I would like it to index the pages above but not the whole of the www.decs.sa.gov.au domain.

Any suggestions as to how I setup Webinator to do this? I have tried all sorts of variations and either get the above result or an index with far more results than I want.

Thanks
Henry
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Indexing query

Post by mark »

Not sure why you're getting unwanted prefix if stay under is set to no. But with stay under set to no it is free to walk the entire site. If you have a specific list of files you want perhaps a page file or page url would be the way to go if you don't want to specify them all as base urls with a max depth of 0.
legedza.henry
Posts: 142
Joined: Wed Jul 24, 2002 11:52 pm

Indexing query

Post by legedza.henry »

Apologies - Stay Under is set to Yes (I was looking at the wrong profile)

If I set it Stay under to No it goes and indexes the entire site which is huge.

I only want it to index those links that emanate from the base url but not the whole of www.decs.sa.gov.au
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Indexing query

Post by John »

If you only want the 20 or so links on that page indexed then you may want to set depth 1.
John Turnbull
Thunderstone Software
legedza.henry
Posts: 142
Joined: Wed Jul 24, 2002 11:52 pm

Indexing query

Post by legedza.henry »

Unfortunatelty the 20 or so links aren't on the one page within the base url, they are a couple of levels down.

So if change the depth value to get all of those I also get lots of others from the main www.decs.sa.gov.au site.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Indexing query

Post by mark »

Let me see if I follow...
Given base url http://abc/def you want to walk everything having the prefix abc/def and also want to get the individual pages linked by def but aren't under def. For those pages you don't want to follow more links.

You could modify the script slightly to treat the unwanted prefix pages as offsite pages then set off-site pages to y. Where you find <$reason="Unwanted prefix"> change it to <xtree insert $u offsite> .
legedza.henry
Posts: 142
Joined: Wed Jul 24, 2002 11:52 pm

Indexing query

Post by legedza.henry »

I made the change to the script and the result is now the previously "unwanted prefix" pages are being completely ignored during the index even with Offsite set to Yes

eg all http://abc/def pages being indexed
all completely offsites referenced by abc/def are being indexed

but those sites http://abc or http://abc/lmn or http://abc/rst are not being indexed but also no error is being reported.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Indexing query

Post by mark »

Sorry, $u above should be $olinks .

<xtree insert $olinks offsite>
legedza.henry
Posts: 142
Joined: Wed Jul 24, 2002 11:52 pm

Indexing query

Post by legedza.henry »

Yep, that seems to have doen the trick as well. Thanks.
Post Reply