Walk Dies - related to Process Size

pete.smith
Posts: 73
Joined: Tue May 17, 2005 2:08 pm

Walk Dies - related to Process Size

Post by pete.smith »

Hello
I have recently moved my webinator install from a highly taxed linux box to a fat Solaris box. Same walk settings, but the crawl dies. If I jack up the process size from small to unlimited, I get more pages the more it goes, but unlimited dies at around 40K, my other server got to 800K. How can I monitor to see what is bothering it?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Walk Dies - related to Process Size

Post by John »

You can check the size of the cururls* file in the dataspace directory. It is possible that the memory allocation routines on Linux are better at reclaiming freed space than on Solaris.

Is there any indication why it dies around 40K? You could certainly keep refreshing the crawl with a process size limit of large, and it will eventually crawl everything without using too much memory.
John Turnbull
Thunderstone Software
pete.smith
Posts: 73
Joined: Tue May 17, 2005 2:08 pm

Walk Dies - related to Process Size

Post by pete.smith »

ok we are on to something, cururls is around 8 gig. What can I do?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Walk Dies - related to Process Size

Post by John »

As above, keeping the process limit lower, and schedule refresh crawls frequently. That will dump the to do list to disk, and restart rather than trying to keep 8+ Gig in RAM.

Does anything stand out about the URLs, and why the same URLs weren't seen on the Linux box?
John Turnbull
Thunderstone Software
pete.smith
Posts: 73
Joined: Tue May 17, 2005 2:08 pm

Walk Dies - related to Process Size

Post by pete.smith »

Im sorry I meant 8 meg. If I lower the process limit, it dies even sooner. I just dont get the indication that we have reached the bottom of the search like I did on our other crawl, that stopped in 24 hours at 800K documents. How can I guarantee that my live search will index all of our stuff (or 800,000K which feels like a lot) on the refresh? So maybe my idea of completing the new search wont fly - but I just cant seem to guage if we have hit the bottom.
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Walk Dies - related to Process Size

Post by John »

The simplest way to know you've hit the bottom is that on the walk status page the number "todo" should be 0.
John Turnbull
Thunderstone Software
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Walk Dies - related to Process Size

Post by mark »

To get everything, set process size to large or huge. Set the schedule to every 15 minutes. Then let it refresh until it's done.
pete.smith
Posts: 73
Joined: Tue May 17, 2005 2:08 pm

Walk Dies - related to Process Size

Post by pete.smith »

OK cool but where can I see my 800K number?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Walk Dies - related to Process Size

Post by mark »

The top of the walk status reports the total number of pages in the database.
pete.smith
Posts: 73
Joined: Tue May 17, 2005 2:08 pm

Walk Dies - related to Process Size

Post by pete.smith »

Hi Mark, it croaked:
Process memory limit exceeded (current: 792,748,032, limit: 700,000,000)


Thats on HUGE.
Post Reply