Webinator: Question on 10,000 Page limit

User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Webinator: Question on 10,000 Page limit

Post by Thunderstone »



According to the product description Webinator can be used
for free and will index upto 10,000 pages per index
<http://www.thunderstone.com/webinator/#prodes>.

I know I have 3,000 or less files in my htdocs tree. after
running gw on my tree, gw finishes with these two lines:

591/11000
Visited 626 pages total

I don't think my whole htdocs tree is being walked by
webinator. Now, I have several directories eliminated in
my robots.txt file which might explain why gw seems to stop
at page 626 but 626 seems too small. Can anyone explain
the fraction (591/11000)? This number goes up and up and
up as gw runs. I'm just wondering if I've reached my page
limit by some technicality that I can control. Or, if there
is something else wrong with the way gw walks my site.


--John-------------------------------------------------+
John R. Little Web Developer/Systems Librarian |
Perkins Library * Duke University * Durham, NC * USA |
VOICE: (919) 660-5932 Email: jrl@duke.edu |
http://www.duke.edu/~jrl/ |
+------------------------------------------------------+


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Webinator: Question on 10,000 Page limit

Post by Thunderstone »




591 is the number of pages actually retrieved in this run.
11000 is the number of href's seen (offsite or not).
626 is the number of pages attempted in this run.
You will receive a message if you hit the number of pages limit.

The number of files in the htdocs tree is someone irrelevant. Webinator
walks your site as a web browser does, not directly thru the filesystem.
It will only retrieve pages you specify on the command line or reference on
a fetched page. Your results would indicate that not every file in
your htdocs tree is referenced by your live pages.
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Webinator: Question on 10,000 Page limit

Post by Thunderstone »



Are all the files in the

webinator/db

subdirectory needed?

It seems like I have many unusual files and I wish to start fresh.

The

gw -wipe

command doesn't seem to clear the subdirectory.




------------------------------------------------------------
Roland Leong, Publisher and Editor
Shotgun Report-Internet Magazine for Clay Target Shotgun Sports
<http://www.shotgunreport.com> Email:publish@shotgunreport.com