Page 1 of 1

Fwd: indexing # pages for each site in a list

Posted: Thu May 11, 2000 11:03 am
by Thunderstone




Hi,
I am not able to find a gw option to index only a specified number of pages
of each site in a list.

Using the -p# option with the"&files.lst" the gw program indexes # pages
but not dividing the number of pages into the number of domains in the
files.lst.
I have to index many domains but I want to limit the number of pages
indexed for EACH domain.
Now for example if I have 3 BIG domains (with about 8000 pages) in my
files.lst and the option -p1500, instead of indexing about 500 pages for
domain it indexes 1000 pages from the fist domain, 450 of the second one
and 50 of the third!!
Thanks.





Fwd: indexing # pages for each site in a list

Posted: Thu May 11, 2000 11:56 am
by Thunderstone


-p is an overall limit per run. You'll need to walk each site separately
and clear the todo list between walks:
gw -p1500 -noindex http://site1
gw -wipetodo
gw -p1500 -noindex http://site2
gw -wipetodo
...
gw -index

On unix systems you can automate it with
while read url;do
gw -p1500 -noindex $url
gw -wipetodo
done <files.lst
gw -index