[Madlug] OOM killer strikes unexpectedly

Kevin Buhr buhr+madlug at asaurus.net
Thu Jul 5 14:39:40 CDT 2007


Will Maier <willmaier at ml1.net> writes:
>
> Last night, the OOM killer kicked in when (according to sar) only
> 30% of system memory was in use; swap was still entirely empty.

Will,

Your output indicates that this was a GFP_KERNEL (gfp_mask=0xd0)
request that couldn't be satisfied.  That would exclude the use of
highmem and explains why the killer was invoked with such an enormous
amount of free overall memory (12 gigs and nearly all of it highmem).

Nonetheless, the killer probably was invoked too aggressively.  There
seem to have been an enormous number of problems with the oom killer
in 2.6.9 kernels.  If you search for "oom killer 2.6.9", you should
find a number of threads addressing some of the issues, and some of
the patches or suggestions might work for you.  If there's any hope of
testing a box with a more recent 2.6.x kernel, I'd start with that.

Also, switching to a 64-bit kernel should fix the problem, as it
should mean all your memory is "normal" rather than "highmem".  I will
point out that there's no particular barrier to running select 32-bit
user space software on an x86_64 Linux installation, as long as
appropriate libraries are made available.  Alternatively, I know you
can install a complete 32-bit user-space "chroot" under a bare-bones
x86_64 installation and run essentially everything from in there, so
that might be a possible solution.

Alternatively, I don't think this was ever a stock 2.6.9 feature
(though it's in the mainline 2.6.x kernels now), but you can try to
find a patch to change the 3G/1G user/kernel split to 2G/2G.  It's
that second number that determines the amount of "normal" memory, so
that switch should double it at the expense of reducing the
per-process address space from 3G to 2G.  If you are using any
proprietary kernel modules, beware: I believe the change breaks module
compatibility.

As an alternative, hugemem kernels implement a 4G/4G split at a
significant cost to syscall performance to give you 4G of process
address space plus 4G of normal kernel memory.  Assuming you're
running RedHat, they should have stock kernels with hugemem support.
Again, this will break module compatibility.

Hope that helps...

-- 
Kevin <buhr+madlug at asaurus.net>



More information about the Madlug mailing list