jknapp
Posts: 3 Joined: Thu Dec 21, 2000 1:06 pm
Post
by jknapp » Thu Dec 21, 2000 1:09 pm
I have a list of URLs I want indexed. I don't want GW to traverse anything beyond that page of links.
I have tried both -D0 and -D1 and I am still getting pages outside the scope of my initial list of URLs.
Any thoughts?
mark
Site Admin
Posts: 5519 Joined: Tue Apr 25, 2000 6:56 pm
Post
by mark » Thu Dec 21, 2000 1:54 pm
You either have leftover items in your todo list with that depth or you're also using -o. Wipe your todo list and make sure you're not using -o:
gw -wipetodo
gw -D0 "&listofurls"
If you still have problems, please provide your complete gw command line.
jknapp
Posts: 3 Joined: Thu Dec 21, 2000 1:06 pm
Post
by jknapp » Thu Dec 21, 2000 4:04 pm
Well, yah, I am using -o since the page of URLs includes some that are offsite -- or did I misread the manual on how -o works?
My settings.txt is:
d-
v
o
D0
R
fasp
r
x/stats.asp
z1500000
and I launch with
gw -msettings.txt
http://192.168.10.2/index.asp
I am running it now with the -R switch, and that seems to be working... I guess...
mark
Site Admin
Posts: 5519 Joined: Tue Apr 25, 2000 6:56 pm
Post
by mark » Thu Dec 21, 2000 5:14 pm
That should do what I think you want as long as you've done a -wipetodo first.
That will fetch
http://192.168.10.2/index.asp , then all of the offsite pages that is refers to and quit. Is that not what you want?
-R has nothing to do with crawling behavior.
jknapp
Posts: 3 Joined: Thu Dec 21, 2000 1:06 pm
Post
by jknapp » Thu Dec 21, 2000 8:00 pm
That is what I want it to do, but it seemed to be seeking URLs outside the scope of my document, that I knew were linked from documents I wanted searched. (Does that make sense?)
The last time I ran it with the -R switch, it seemed to operate as planned.
I don't have the time right now to see if it works the same way without the -R switch... but I'll let you know.
Thanks for all your help.
Jeff