Fwd: indexing # pages for each site in a list

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Fwd: indexing # pages for each site in a list

Post by Thunderstone »





Hi,
I am not able to find a gw option to index only a specified number of pages
of each site in a list.

Using the -p# option with the"&files.lst" the gw program indexes # pages
but not dividing the number of pages into the number of domains in the
files.lst.
I have to index many domains but I want to limit the number of pages
indexed for EACH domain.
Now for example if I have 3 BIG domains (with about 8000 pages) in my
files.lst and the option -p1500, instead of indexing about 500 pages for
domain it indexes 1000 pages from the fist domain, 450 of the second one
and 50 of the third!!
Thanks.




User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Fwd: indexing # pages for each site in a list

Post by Thunderstone »



-p is an overall limit per run. You'll need to walk each site separately
and clear the todo list between walks:
gw -p1500 -noindex http://site1
gw -wipetodo
gw -p1500 -noindex http://site2
gw -wipetodo
...
gw -index

On unix systems you can automate it with
while read url;do
gw -p1500 -noindex $url
gw -wipetodo
done <files.lst
gw -index





Post Reply