zalloc project

James Cook falsifian at
Fri May 14 14:25:12 PDT 2021

On Fri, May 14, 2021 at 10:04:28AM -0900, Matthew Dillon wrote:
> Hi James.   Yes, I think removing zalloc is still worth doing as a
> clean-up.  Honestly I'd like to also get rid of objcache as well, replacing
> both zalloc and objcache with the new kmalloc_obj() facility.  I don't
> think this would be a beginner project, though, but you could take a look
> at it and see if you think you can do it.  I'll explain the complexity.
> For zalloc there is a chicken-and-egg problem.  Currently three sub-systems
> still use zalloc:
> SWAPMETA   - side-structures used to manage swap space
> PV ENTRY - side-structures used to manage page table pages
> MAP ENTRY - side structures used to manage virtual memory mappings
> The main reason these subsystems have not been converted is due to (A)
> bootstrapping in early-boot and (B) low-memory deadlock issues.
> For (A) an example is that bootstrapping the kernel VM system in early boot
> requires a number of PV entries and map entry structures, but obviously
> those cannot be allocated that early in the boot so they must be
> pre-reserved.  zalloc() has a mechanism to do this.   To implement this
> pre-reservation with kmalloc_obj() would require reserving at least one
> slab for each of the three zones and enhancing the kmalloc_obj() subsystem
> to utilize the reserve during early boot.   This part isn't difficult but
> it does entail some significant programming.
> For (B) an example would be... when a normal kmalloc() needs to allocate a
> new block of memory it calls into the VM system which might also need to
> allocate a PV entry or a MAP entry... which again calls kmalloc() again and
> boom, low-memory deadlock.  Similarly, if the pageout daemon is trying to
> page to swap due to a low-memory condition it may have to allocate
> swap-meta structures... but in a low-memory situation that can deadlock as
> well.
> Dealing with these issues requires reducing the number of side-structures
> that might need to be incidentally allocated in low-memory situations,
> meaning that if the zalloc subsystem is converted to kmalloc_obj(), then
> kmalloc_obj() needs to be enhanced to maintain a few extra slabs ready to
> go.   I think implementing this enhancement to kmalloc_obj() would not be
> too hard.. basically when kmalloc_obj() needs to allocate a new slab it
> will check to see how many are on the free list and when there are not
> enough it will pro-actively do non-blocking allocations ahead of the
> demand.
> So for (A) we would need a mechanism to register one or more full slabs
> with a zalloc_obj() zone, and for (B) the zalloc_obj() function itself
> would pro-actively always try to maintain X extra slabs on its free list
> (non blocking, so in a low memory situation it will try to maintain the
> additional zones but not block if it can't).
> So getting rid of zalloc is not trivial.  zalloc deals with all of these
> situations and our current kmalloc_obj() facility does not.  Nor does
> objcache really.
> -Matt

Thanks for taking the time to write that detailed answer. I hadn't seen

Is (A) really an issue for SWAPMETA? The current code in swap_pager.c
doesn't bother with zbootinit, and it looks like the code there isn't
run until fairly late in the boot process (when the pageout threads are

(B) sort-of occurred to me, but I didn't realize zalloc is specifically
designed to help with that, so I had assumed stuff like
vm_page_count_severe() ensured enough memory was free to avoid the
issue (for SWAPMETA; (A) would still be a problem for the other two
subsystems). I guess those parameters are not tuned to guarantee (B)
won't happen for SWAPMETA?

Assuming I'm right about (A), would it make sense to start by solving
(B), i.e. make kmalloc_obj maintain extra slabs? Then SWAPMETA could be
switched over before (A) gets solved for the other two subsystems.

Still assuming that's right, I think there's a reasonable chance I
could get SWAPMETA switched over, especially with a bit of
hand-holding. In theory your solution sounds straightforward, but I
have a lot of code to read and I'm sure I'll run into trouble

If someone else is planning to do it, they should probably go ahead.
Otherwise, my tentative next steps:

* Read (about) kmalloc_obj. I guess I'll start with the first commit
  message (e9dbfea1).

* Blindly switch swap_pager.c over to kmalloc_obj, and confirm the
  deadlock issue (and see if there are any other surprises). Hopefully
  I can make the deadlock reproducible.

* Implement your suggestion above to make the deadlock go away, using
  generous parameters (allocate plenty of extra slabs).

* Do some careful analysis to figure out how many extra slabs are
  actually needed, in order to fine-tune the implementation and be able
  to argue convincingly that no deadlocks are possible.


More information about the Kernel mailing list