We run Webinator Version 2.55 Release: 20000120, and are getting strange results from him:
If I type in "cricket" some documents with the word "cricket" somewhere in the body text appear, but docs with the word "cricket" that occur with greater frequency are not picked up at all!
Is there something I can do to my vortex search script to better serve results?
Here is the complete gw command (Taken from a once-weekly crontab procedure)
# - Change dir to webinator:
cd /www/httpd/html/webinator
# - Instruct gw to rewalk globalDB
bin/gw -rewalk -dglobalDB
# - Change owner,group,mode of globalDB after rewalk:
chown nobody globalDB
chgrp nobody globalDB
chmod 775 globalDB
# - Change dir to globalDB:
cd /www/httpd/html/webinator/globalDB
# - Change mode of all tables in globalDB:
chmod 775 *.*
The first thing to check is that those pages were actually indexed by looking at gw.log to see if they were retrieved successfully.
If they were, and you can search for those pages with other terms look at the match info, and make sure "cricket" does actually occur in the text that Webinator indexed.
I checked: /path/to/webinator/DBname/gw.log for its last sweep (yesterday) and there is no mention of the directory in which the desired document is located!??
I checked permissions on this dir and it+docs are all 775 and have ownerships that reflect in dirs that were indexed...
As you can see from my crontab extract, we are rewalking, and as I remember it, the initial index was set to walk about 10 domains under our top domain of: .anglia.ac.uk so in theory, all dirs (of which the target dir in this context is one) under this should be re-indexed...
Our robots.txt has no entry to exclude access to this dir? Could I possibly be looking at the wrong gw.log file? Doing a >locate *gw.log brought up a page or so of results..
The pages that were retrieved from the search term: 'cricket' did feature this term in the body-text according to 'match info' - just not the desired document!
which makes it all the more confusing that some pages containing the search word are not displayed as a result. Does it matter that it includes the '/' at the end of the url, and many of the successfully indexed dirs don't:
Are the files in the directory you wanted indexed indexed? I noticed that some of the links had .shtml extensions. Were these included in the original walk with a -fshtml option?