zalloc project

Matthew Dillon dillon at
Fri May 14 16:58:30 PDT 2021

SWAPMETA is used for configuring and paging to swap space, so there is no
boot dependency.... that is, the kernel can bootstrap completely without
ever having to use swap space.   However, once the system is running,
paging to swap typically occurs due to low-memory situations (if you think
about it, that's why the system decides it needs to page to swap in the
first place).

Since one of the few ways the system has to free memory is to page data to
swap, it can wind up in a no-win situation if the page-out code winds up
having to allocate kernel memory in order to manage the swapped pages and
blocks on said allocations.  The key here is that the system has to be able
to 'make progress' freeing memory, so as long as a few pages are available
in the emergency free page reserve, and as long as those pages are ONLY
needed to back actual objects (and no additional pages are needed e.g. for
PV entries or MAP entries or page-table-page infrastructure to support the
new mapping), then the system can make progress.

This is what zalloc is able to guarantee that none of the other memory
subsystems in the kernel are able to guarantee.  zalloc() is able to
guarantee that even a single page allocation will be sufficient to make
progress on a stuck zalloc request.  Since only one is needed, the
emergency page reserve is sufficient for that.  And then the paging code is
able to free up a page soon after so the emergency page reserve doesn't
become exhausted.


I believe that the kmalloc_obj subsystem (which is brand-new) can be
adjusted to make these guarantees, primarily by ensuring that extra slabs
are allocated ahead-of-time whenever possible.  And for anything which is
boot-time sensitive, the extra slabs could be installed at early boot prior
to first use (kinda like how the zalloc system is initialized, except these
slabs are 128KB each.  Still, no reason why a few couldn't be declared
statically as BSS in the kernel image).   The code in question would be the
_kmalloc_obj() path starting line 664 of kern_kmalloc.c, and its related
slab allocation which occurs at line 821.  Basically, some code would have
to be added to attempt to maintain N (e.g. like 3) slabs on the per-zone
ggm->empty list on any normal allocation that eats a slab out of that list,
using non-blocking kmem_slab_alloc() calls, and falling back to blocking
kmem_slab_alloc()'s if the list winds up being empty anyway.

Probably a bit of work in kmem_slab_alloc() would also be needed to support
M_INTNOWAIT in the slab-maintain code to allow the reserve to be used.  But
M_WAITOK would still have to be used if slabs wind up being exhausted
anyway.  Something like that.  There are also additional possible
low-memory deadlock points involved in terms of the fact that
kmem_slab_alloc() dynamically allocates KVA space as secondary factors, but
we would have to do a lot of low-memory / paging testing to determine what
deadlocks might still exist.

Right now the system has basically solved the low-memory deadlock issues so
we don't want to reintroduce any.


At the moment nobody is planning on doing this work so if you would like to
continue to review it, please do!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Kernel mailing list