Webinator questions

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Webinator questions

Post by Thunderstone »



Hi,

I have recently installed webinator (great piece of software BTW!)
an have a few (quick) quetions:

1) I joined the mailing list but, but I have had no reply. Is this
to be expected? The address I used to join was "Glynn.Robinson@bl.uk"
with a real name of "Glynn Peter Robinson"

2) Is there anyway of wiping the "todo" list?

3) How do you get webinator to search anything other than the default
database?

4) Is there a generally accepted way to index other people's sites
without upsetting them too much? I.e. index late at night, only index
pages every month or so?

Thanks for your help.

Glynn Robinson
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Webinator questions

Post by Thunderstone »




The order that documents are returned is based on their insertion
order into the database. If you wish to enforce a retrieval ordering,
place the URLS of the high priority documents in a file and use
the "&file" option ( http://www.thunderstone.com/gwman/node24.html ).

In a fresh database, issue the command: gw -b "&high_prior.lst"
The -b option forces Webinator to do a breadth first walk.
( http://www.thunderstone.com/gwman/node22.html )



To delete with case respected, try this command first:

gw -sc "select Url from html where Url like '/\R/edu/ILM'"

If that works as expected, then do:

gw -sc "delete from html where Url like '/\R/edu/ILM'"
gw -sc "delete from refs where Ref like '/\R/edu/ILM'"

Then re-index the database with gw -index

The commands above use the regular expression matcher within
Texis' like clause. See: http://www.thunderstone.com/texisman/node113.html

Hope this helps,

Bart Richards
Thunderstone


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Webinator questions

Post by Thunderstone »




I have noticed different behavior when indexing Frames.
Does Webinator have a problem with frames?

Also I have some html files that are using server includes and are
renamed to xxxx.shtml

Is webinator able to index files containing server includes?

Great program,
thanks,
kelsey
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Webinator questions

Post by Thunderstone »




Webinator fetches all frames on a framed page into one record. That
way webinator's view of the documents is more like what the web surfer
sees.


Webinator doesn't care what the server does. To get .shtml files
you just need to tell Webinator that's a valid extension with
the -f option.
gw -fshtml http://www.mysite.com/
See: http://www.thunderstone.com/gwman/node21.html
Post Reply