a couple of questions

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

a couple of questions

Post by Thunderstone »



Hi
I am using free webinator 2.5
when I try to exclude pages from another site ie. newsgroup lists.
Webinator goes ahead and indexes them anyway
I have tried
gw -http://www.scu.edu.au/lists/ http://www.scu.edu.au

it gets all the folders under lists ie
www.scu.edu.au/lists/future_1/1851.html

I tried
gw -xhttp://www.scu.edu.au/lists/ http://www.scu.edu.au/

same result . I end up with 10,000 pages of news articles

I tried to delete the extra pages with
gw -s"delete from html where Url like '/www.scu.edu.au/lists'"
gw -s"delete from refs where Url like '/www.scu.edu.au/lists'"
gw -index

but the pages appear to still be there
help please.

when you run webinator the first time with a option file and url list it
indexes the home page

if you run it again it goes to the links on the home page .. Is there any
way to know how deep it will go before you run it ?

Wayne




User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

a couple of questions

Post by Thunderstone »



Was the database clean when you started or was there stuff left over
in the todo list from a previous incomplete walk? Once a Urls are in todo
gw will go ahead and fetch them, but no new ones should be added
to todo. Use "gw -wipe" to start fresh.

Your -s syntax is incorrect. There should be a space separating the
option and the SQL statement.
gw -s "delete from html ..."

In the absence of limitation options, such as depth, page count, or
exclusions, gw will walk all the way down all specified sites. Are you
using other options that you are not mentioning? It sounds like maybe
you're using some limitations that prevent gw from finishing the walk.




Post Reply