Walking out of site uncontrollably

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Walking out of site uncontrollably

Post by Thunderstone »



I'm having a lot of trouble getting Webinator to stay within the site I'm
attempting to walk. Webinator shouldn't jump off the site I'm walking to
begin with (since I haven't specified the -o option)--plus I'm setting the
-j option to limit the URLs that are indexed. Despite both of these facts,
Webinator jumps off to other sites, attempting to index thousands of pages
I don't want. See log below...

1999/01/30 23:37:59 Begin (20068) gw -wipetodo
1999/01/30 23:37:59 End (20068) Visited 0 pages total
1999/01/30 23:42:19 Begin (20255) gw -d- -Iindex.html
-jhttp://hjem.get2net.dk/Svellov/ -meta=keywords,description
http://hjem.get2net.dk/Svellov/
1999/01/30 23:42:21 Retrieving http://hjem.get2net.dk/Svellov/
1999/01/30 23:42:22 Retrieving http://hjem.get2net.dk/Svellov/Essen.html
1999/01/30 23:42:24 Retrieving http://hjem.get2net.dk/Svellov/Hall.html
1999/01/30 23:42:25 Retrieving http://hjem.get2net.dk/Svellov/Intim.html
1999/01/30 23:42:26 Retrieving http://hjem.get2net.dk/Svellov/Links.html
1999/01/30 23:42:35 Retrieving http://hjem.get2net.dk/Svellov/List.html
1999/01/30 23:42:37 Retrieving http://hjem.get2net.dk/Svellov/Ludar.html
1999/01/30 23:42:38 Retrieving http://hjem.get2net.dk/Svellov/Magz.html
1999/01/30 23:42:40 Retrieving http://hjem.get2net.dk/Svellov/News.html
1999/01/30 23:42:41 Retrieving http://hjem.get2net.dk/Svellov/Presa.html
1999/01/30 23:42:43 Retrieving http://hjem.get2net.dk/Svellov/Snipp.html
1999/01/30 23:42:44 Retrieving http://members.aol.com/knutmwolf/spielplatz/
1999/01/30 23:42:45 Retrieving http://members.aol.com/Spieleclub/index.htm
1999/01/30 23:42:47 Retrieving
http://members.aol.com/knutmwolf/spielplatz/aktuell.htm
1999/01/30 23:42:47 Retrieving
http://members.aol.com/knutmwolf/spielp ... itoral.htm
1999/01/30 23:42:48 Retrieving
http://members.aol.com/knutmwolf/spielp ... stbuch.htm
1999/01/30 23:42:48 Got signal 2 - will attempt to quit nicely
1999/01/30 23:42:48 Got signal 2 - will attempt to quit nicely
1999/01/30 23:42:48 End (20255) Visited 16 pages total

Any ideas?

Thanks,
Brian




User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Walking out of site uncontrollably

Post by Thunderstone »



I'm guessing that you have walked members.aol.com in this database at some
point. gw remembers the hosts of all urls you ever specify in a given
database as valid hosts to visit. So if something on hjem.get2net.dk points
to members.aol.com, it will accept it.

To make gw forget those hosts/urls
gw -st "delete from options where Name='URL'"

..
..


Post Reply