Walking tried to access domains that I have not enabled with

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Walking tried to access domains that I have not enabled with

Post by Thunderstone »




-domain is an enabler, not a restrictor. It will allow walking any host in
the given domain in addition to any specified host(s). gw will inherently
only walk given hosts and those allowed by -domain.

In any event, gw will lookup IP addresses for ALL hosts to resolve aliases.
-L will prevent this.

The problem with your above command line is that half of your options come
after the Url. All options must be before the first walk Url.

gw -domain=vmb.ent.nwie.net -domain=nationwide.com -dd:/paul/webinator
-Iindex.htm -fshtml -fasp -o
-xhttp://louie.lco.nw.nwie.net:8585/search97cgi
-xhttp://www.vmb.ent.nwie.net/callup?
-xhttp://www.vmb.ent.nwie.net/orgchart? -meta=keywords -w0 -p2000 -L
http://vmb.ent.nwie.net/~schulzp/


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Walking tried to access domains that I have not enabled with

Post by Thunderstone »




Thanks. I figured out that the walk url had to be last.

When the gw command completed (with the -p2000 after the walk url) part of
the pages that it walked were not in the resulting index. Did the -p2000
cause gw to truncate the index after it walked? (My log showed it walked
3400 pages, but nothing at the end of the walk was in in the index.)




mark@thunderstone.com on 03/27/98 03:36:35 PM

Please respond to webinator@thunderstone.com

Sent by: mark@thunderstone.com

To: schulzp@nationwide.com
cc: (bcc: Paul A Schulz/Nationwide/NWIE)
Subject: Re: Walking tried to access domains that I have not enabled with




see
excluded.
should
from
-domain is an enabler, not a restrictor. It will allow walking any host in
the given domain in addition to any specified host(s). gw will inherently
only walk given hosts and those allowed by -domain.
In any event, gw will lookup IP addresses for ALL hosts to resolve aliases.
-L will prevent this.
The problem with your above command line is that half of your options come
after the Url. All options must be before the first walk Url.
gw -domain=vmb.ent.nwie.net -domain=nationwide.com -dd:/paul/webinator
-Iindex.htm -fshtml -fasp -o
-xhttp://louie.lco.nw.nwie.net:8585/search97cgi
-xhttp://www.vmb.ent.nwie.net/callup?
-xhttp://www.vmb.ent.nwie.net/orgchart? -meta=keywords -w0 -p2000 -L
http://vmb.ent.nwie.net/~schulzp/






User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Walking tried to access domains that I have not enabled with

Post by Thunderstone »




The -p2000 means get 2000 pages on this run, not inclusive of what's already
in the database. Any page listed as walked without an error should be in the
database. It won't truncate after a run. How did you determine that those urls
are not in the database? Do you see them if you do:
gw -s "select Url from html"



Post Reply