phk malloc, was (Re: ptmalloc2)

Tue Feb 22 22:33:28 PST 2005

Bill Hacker wrote:
> I still thing fixing the application is preferable.

What's there to fix if it's not the application's fault?
Application is asking the OS exactly what it wants by design to make certain
guarantees: physical memory, the OS on the other hand only promises to,
but doesn't guarantee to give it to the application. When I go to a bank
to cash a check, I expect hard, cold cash, not an IOU. Should the bank
blame me for asking money because they don't have it?

> Documenting an 'overcommit switch' is all well and good, but theory 
> aside, how stable can you expect MessageWall to be *in the real world* 
> on a potentially resource-challenged LinBox that is running other things 
> as well?

In theory it should be much more stable. If not, then there's something
wrong with the Linux kernel, and I'll report a bug. Robert Love wrote the
overcomit accounting patch sometime in 2002 to make the out of memory
situation impossible on Linux with careful tuning. It moves all memory
failures to allocation routines. You either get physical memory or the
allocation routines return failure, you retry later. This is much better
than random SIGKILLs sent to your preallocating services for which the
OS can't find physical memory when the box is stressed.

In theory this should be possible:

1) Turn off swap.
2) Set rlimits on _all_ services within a reasonable amount to not exceed
the available physical memory within a certain percentage - give a
reasonable amount of slack, lets say 20 percent to the system when all
said and done.
3) Shut off the overcommit and let 'er rip.

The above should guarantee that 'preallocating' service will never crash
due to overcommit. The other processes are allowed to grow to their
rlimit, and will be killed by the kernel when they reach it.

And we'll see how it does. This is on a test box of course. Am I missing
anything?