Excluding a spider page

Post Reply
smallguy
Posts: 7
Joined: Tue Jan 08, 2002 6:03 am

Excluding a spider page

Post by smallguy »

using:

Webinator WWW Site Indexer Version 2.56 (Commercial)
Release: 20010814

We have a page that lists every page in the site, to make the site easier to spider for webinator (as some links are displayed using javascript).

The only problem is, this spider page is getting spidered itself. I've tried using robots.txt in the form of:

Disallow: /dir/spider.html

But this wasn't picked up. I tried using the -x argument but discovered this stopped the page from being read in the first place.

Does anyone have any suggestions? I need to ignore this page but for it to also be read!
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Excluding a spider page

Post by mark »

That would require meta robots tags within the html page itself (<meta name="robots" content="NOINDEX,FOLLOW">) and Webinator 4 to support meta robots. Webinator 2 does not support meta robots.

The alternative is to do the walk and delete the list page afterwards. See the webinator manual for how to remove pages from the database.

BTW, your robots.txt syntax is incomplete. See the Webinator manual for a description of the syntax.
smallguy
Posts: 7
Joined: Tue Jan 08, 2002 6:03 am

Excluding a spider page

Post by smallguy »

I'm trying to delete using the following:

C:\Inetpub\cgi>texis -d "C:\Program Files\Thunderstone Software\Webinator2\folder\db_all" -s "DELETE FROM html WHERE Url='domain.com/spider.html'"

But i get the following back:

000 Mar 21 09:49:17 Insufficient permissions on html in the function ipreparetre
e
000 Insufficient permissions on html in the function ipreparetree
000 Mar 21 09:49:17 SQLPrepare() failed with -1 in the function prepntexis
000 SQLPrepare() failed with -1 in the function prepntexis


I've also tried referencing the row in the table by it's id, with the same outcome.

What am i doing wrong?!
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Excluding a spider page

Post by mark »

gw creates the tables as texis user "_SYSTEM" give the
-u _SYSTEM -p ""
options.
Post Reply