Site Indexing Problem, SCO Webinator

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Site Indexing Problem, SCO Webinator

Post by Thunderstone »



Mark:
~~~~~~~~~~~~~~~~~~~~~~~~~
Re: On Wed, 4 Dec 1996, Mark Willson wrote:
-> > I'm having difficulty with using 'gw' to index my site.
-> ..
-> > When I try to access one of my sites (http://www.nurse.net/index.html),
-> > I get an error from gw about "Interrupted system call ... Can't open
-> > connection to ...: timeout"; see captured output below. ...
-> > ...
->
-> The timeout is not directly related to Webinator, but the response speed
-> of the site you are trying to index. The connection between the machine
-> you are running gw on and ww.nurse.net is apparently slow. You can increase
-> the time gw will wait for a response with the -t option (e.g. -t60).
-> See http://www.thunderstone.com/gwman/node22.html .
->
-> Also make sure the machine you are running gw on can reach the machine
-> you are trying to index. From the webinator/bin directory, try:
-> ./geturl http://www.nurse.net/index.html
~~~~~~~~~~~~~~~~~~~~~~~~~

Mark, thanks for the suggestions. It turns out not to be a timing
problem, 'gw' and the site are on the same machine. However, when geturl
couldn't access the URL, I started playing around with how I specified
the URL. My machine's name is intwiz.wizards.net (name associated with
the IP address specified on my NIC). There are a number of web sites on
the machine (as virtual hosts having their own IP addresses aliased onto
my NIC also). I'm hoping to index all these sites, for instance
nurse.net in the Web server /www directory
wizards.net in the " " " "
etc

My attempts to reference the URL as http://www.nurse.net/index.html
failed as I've mentioned. Things do work however with the URL
specification of http://intwiz.wizards.net/nurse.net/index.html . That
solves my first problem (are there any other means of specifying this
URL?) but introduces two others <s>.
- many of the 'href' links on the pages of the nurse.net site make
use of my virtual host aliasing and are specified via
http://www.nurse.net/some-page.html
- some of the 'related' sites, cross reference each other's pages

gw flags both these links as 'off site' and doesn't index them (the
-o option is only a partial solution). Is there some way I can run gw so
that it
- recognizes http://intwiz.wizards.net/nurse.net/... and
http://www.nurse.net/... are the same site
- will follow the cross referenced links between my related sites

Thanks again, Layne
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Layne Zee, Internet Services Tech, Unix/Programming Wizard
Work: Internet Wizards, 212 Railroad Ave N, Kent, WA 98032
e-mail: lzee@wizards.net Phone: 206-813-3033
Home: e-mail: laynezee@cris.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Post Reply