cant index all the site

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

cant index all the site

Post by Thunderstone »



Hello,
We dowmloaded and installed Webinator but it only indexes about 1/3 of our
site.any help would be appreciated.
Derek Moores



User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

cant index all the site

Post by Thunderstone »



You don't give any specifics (like what command line you used and what
pages were missing), so I'll throw out some general ideas.

gw will respect robots.txt if present (see -r).

gw will only index .txt, .html, and .htm files by default (see -f).

gw does not follow java or javascript or server side image maps.
(it does follow client site image maps)

gw does not index pages that are not linked in. It only indexes what a user
could get to by starting at the url you provide and clicking with java
and javascript turned off.

gw does not follow offsite links unless told to do so (see -o -domain -network).

gw does not follow /cgi-bin links unless told to do so (see -C).

gw does not enter protected areas unless given the password (see -U -P).

gw does not submit forms.

Using the -L option can cause gw to skip aliases (yoursite.com vs www.yoursite.com).

gw truncates large pages, possibly losing urls in the process (see -z)

And, of course, if you interrupt it it will stop.




Post Reply