Error message

michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

Error message

Post by michel.weber »

Hi i find the following error messages in the 'messages' log

Dec 1 09:56:02 alatar kernel: VM: killing process texis
Dec 1 09:56:02 alatar kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Dec 1 09:56:02 alatar kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)

This has been goin on for some time, but is happening more and more often.

It seems to affect 2 (out of 3) of our appliances.

The killed process is not always the same ...

Should i be concerned?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Error message

Post by mark »

The system was completely out of memory and had to start killing processes. Process "texis" being killed that way may cause database corruption depending on what it was doing when killed. You're probably running too many big walks at the same time.

The tech support info page will give you a rundown of memory usage.
User avatar
John
Site Admin
Posts: 2623
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH

Error message

Post by John »

What do you have "Maximum Process Size" set to in your profiles?
John Turnbull
Thunderstone Software
michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

Error message

Post by michel.weber »

I set them to large.

On one of the 2 boxes there were 2 rewalks running (both replicating locally and remotely)

On the second there was NO rewalk running locally. Only the receiving profile was being updated.
Currently there is an updateindex script running which uses ~80% of physical memory.

Question? How many walks with what sizes is it safe to run at the same time?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Error message

Post by mark »

That will require some math.
Consider: Total system RAM, "Parallelism: Servers", and "Max Process Size". Each parallel server can use up to "Max Process Size".

For each profile multiply it's "Parallelism: Servers" times that profile's "Max Process Size". That's the rough memory requirement for that profile.

Do not exceed total system RAM. It would be better to be a little under to leave room for caching etc.

Process sizes:
Small = 25MB
Medium = 50MB
Large = 100MB
Huge = 700MB

If you don't know how much memory you purchased in your appliance you can find out from the system log called "messages". Look for "kernel: Memory:". If you've booted recently it will be near the end of the log. If not it may have been rotated into messages.1, or messages.2 etc.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Error message

Post by mark »

p.s.
Sometimes the "Remove common" feature can use more than the max process size amount of memory. It runs after the walk is complete but before the search index is created/updated.
michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

Error message

Post by michel.weber »

I did not know i had the option of choosing the memory size of the appliance.

It looks like we have 1 Gbyte in each

So with a walk size of 'large' and 1 server, that should leavee plenty of RAM.

What about replication? Does that take a lot of memory?
or just a lot of cpu?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Error message

Post by mark »

You don't specify the amount of ram, but the size of appliance you purchase affects the amount of ram installed.

Replication shouldn't use a lot of memory except on the receiver when it updates the search index for the replicated records.
michel.weber
Posts: 256
Joined: Sat Oct 08, 2005 12:40 pm

Error message

Post by michel.weber »

Hi

i reduced the size of the walks and the error is still there.

I have 4 profiles (3 small and 1 large). Even if they all were running at the same time they should not overwhelm the memory should they?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Error message

Post by mark »

Which error, the "VM: killing process" one? You're getting new ones with current times? Can you tell what is/was going on when that occurred?

Turn off "remove common" to see if that fixes it.