zalloc project

Mon May 17 11:01:23 PDT 2021

On Fri, May 14, 2021 at 02:58:30PM -0900, Matthew Dillon wrote:
> SWAPMETA is used for configuring and paging to swap space, so there is no
> boot dependency.... that is, the kernel can bootstrap completely without
> ever having to use swap space.   However, once the system is running,
> paging to swap typically occurs due to low-memory situations (if you think
> about it, that's why the system decides it needs to page to swap in the
> first place).
> 
> Since one of the few ways the system has to free memory is to page data to
> swap, it can wind up in a no-win situation if the page-out code winds up
> having to allocate kernel memory in order to manage the swapped pages and
> blocks on said allocations.  The key here is that the system has to be able
> to 'make progress' freeing memory, so as long as a few pages are available
> in the emergency free page reserve, and as long as those pages are ONLY
> needed to back actual objects (and no additional pages are needed e.g. for
> PV entries or MAP entries or page-table-page infrastructure to support the
> new mapping), then the system can make progress.
> 
> This is what zalloc is able to guarantee that none of the other memory
> subsystems in the kernel are able to guarantee.  zalloc() is able to
> guarantee that even a single page allocation will be sufficient to make
> progress on a stuck zalloc request.  Since only one is needed, the
> emergency page reserve is sufficient for that.  And then the paging code is
> able to free up a page soon after so the emergency page reserve doesn't
> become exhausted.
> 
> --
> 
> I believe that the kmalloc_obj subsystem (which is brand-new) can be
> adjusted to make these guarantees, primarily by ensuring that extra slabs
> are allocated ahead-of-time whenever possible.  And for anything which is
> boot-time sensitive, the extra slabs could be installed at early boot prior
> to first use (kinda like how the zalloc system is initialized, except these
> slabs are 128KB each.  Still, no reason why a few couldn't be declared
> statically as BSS in the kernel image).   The code in question would be the
> _kmalloc_obj() path starting line 664 of kern_kmalloc.c, and its related
> slab allocation which occurs at line 821.  Basically, some code would have
> to be added to attempt to maintain N (e.g. like 3) slabs on the per-zone
> ggm->empty list on any normal allocation that eats a slab out of that list,
> using non-blocking kmem_slab_alloc() calls, and falling back to blocking
> kmem_slab_alloc()'s if the list winds up being empty anyway.
> 
> Probably a bit of work in kmem_slab_alloc() would also be needed to support
> M_INTNOWAIT in the slab-maintain code to allow the reserve to be used.  But
> M_WAITOK would still have to be used if slabs wind up being exhausted
> anyway.  Something like that.  There are also additional possible
> low-memory deadlock points involved in terms of the fact that
> kmem_slab_alloc() dynamically allocates KVA space as secondary factors, but
> we would have to do a lot of low-memory / paging testing to determine what
> deadlocks might still exist.
> 
> Right now the system has basically solved the low-memory deadlock issues so
> we don't want to reintroduce any.
> 
> --
> 
> At the moment nobody is planning on doing this work so if you would like to
> continue to review it, please do!
> 
> -Matt

Thanks, this is really helpful. I have a lot of code-reading to do, to
make sure I understand what is happening.

I don't know if I'll produce anything useful, but I'm certainly having
fun reading.

-- 
James