dowalk_beta2 scripted walker CPU utilization

valery
Posts: 26
Joined: Thu Mar 15, 2001 9:24 pm

dowalk_beta2 scripted walker CPU utilization

Post by valery »

Hi,

per your advice, I'm now running scripted walker for my search engine index. However, I noticed that CPU utilization is always close to 100% when dowalk_beta2 runs. When I used gw, this did not happen (predictably enough as the main bottleneck should be fetching speed, not CPU (Celeron 400), I suppose)

Here is the snapshot from 'top' utility:
___________________________
Mem: 192720K av, 189628K used, 3092K free, 46392K shrd, 59312K buff
Swap: 112448K av, 19356K used, 93092K free 66524K cached

PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
3754 ******** 13 0 2316 2316 1540 R 0 86.9 1.2 2:59 texis
3755 ******** 1 0 3560 3560 1656 S 0 6.8 1.8 0:16 texis
___________________________

Is this normal? If not, how can I fix it?

Thanks,
Valery.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

dowalk_beta2 scripted walker CPU utilization

Post by mark »

No, that's not normal at all. Did you change dowalk? What kind of exclusions and such have you added/changed? Are you getting errors in gw.log? What are the command lines for the texis processes you listed in top. ps -auxw will give you those.
valery
Posts: 26
Joined: Thu Mar 15, 2001 9:24 pm

dowalk_beta2 scripted walker CPU utilization

Post by valery »

[should have thought about it myself...]
yes, there are some messages in gw.log:
a lot of
___________________
100 /home/httpd/html/BIOZAK/webinator/dowalk_beta2(proclinks) 369: Table refs too big while processing url http://www.biotinst.demon.co.uk/zaxis/buffers.htm
___________________
for every page being inserted into the database.

the output of ps -uawx:
___________________
/home/httpd/cgi-bin/BIOZAK/texis -d/home/httpd/html/BIOZAK/webinator/db maxpages=1000 top=http://www.boehringer-ingelheim.com /home/httpd/html/BIOZAK/webinator/dowalk_beta2/dispatch.txt
/home/httpd/cgi-bin/BIOZAK/texis db=/home/httpd/html/BIOZAK/webinator/db maxpages=1000 top=http%3A//www.boehringer-ingelheim.com /home/httpd/html/BIOZAK/webinator/dowalk_beta2/go.txt
___________________

Background info:
1. I created tables using standard gw -createdb directive.
2. maxpages is external parameter as you can see.
3. I added some checks in the walker on the Title info, but nothing of the kind that would create 100% CPU loads (just simple REX patterns)...

Thanks for your fast replies!
Valery.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

dowalk_beta2 scripted walker CPU utilization

Post by mark »

The "Table refs too big" message is because the vortex distributed with Webinator will not add to tables larger than 30 megs. Full Texis does not have such a limit. If you don't need the "parents" feature of the search, you can set $storerefs to 0 to prevent the insertions.

What are the rex patterns? The reason I asked was that an incorrectly formed rex pattern might cause excessive cpu usage.
valery
Posts: 26
Joined: Thu Mar 15, 2001 9:24 pm

dowalk_beta2 scripted walker CPU utilization

Post by valery »

Then how I can index more than 10000 documents with my paid commercial Texis license? gw does not have such restriction, does it? Does this mean that scripted walker is really useful only for Full Texis users?

BTW, my html table is 134MB long now and Vortex does not complain. Is the 30MB limit only for refs table??

I would like to use scripted walker to populate my database and I would be very glad if I can get gw-style limits for essentially the same task. Is this possible?

Thanks,
Valery.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

dowalk_beta2 scripted walker CPU utilization

Post by mark »

gw does not have that restriction, only vortex as shipped with Webinator. The limit is on the .tbl files, not the .blb file which contains most of the html table data. This variation will be removed with the next release of Webinator so that gw and vortex agree.
valery
Posts: 26
Joined: Thu Mar 15, 2001 9:24 pm

dowalk_beta2 scripted walker CPU utilization

Post by valery »

> This variation will be removed with the next release of Webinator so that gw
and vortex agree.
Will Vortex allow >30MB or Webinator will not? If the latter, how it translates for my Commercial license allowing for unlimited web indexing?

Thanks,
Valery.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

dowalk_beta2 scripted walker CPU utilization

Post by mark »

The vortex limit will be changed. It was placed in there before there was a scripted walker.
valery
Posts: 26
Joined: Thu Mar 15, 2001 9:24 pm

dowalk_beta2 scripted walker CPU utilization

Post by valery »

HI Mark, sorry to bug you with this again, but could it be possible for me to remove this restriction now?

This is a wonderful product and I really want to make the most of it.

Thanks for your support,
Valery.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

dowalk_beta2 scripted walker CPU utilization

Post by mark »

Sorry, we can't do that right now. It would require a complete distribution of the major new version which is not quite ready yet.