falsifian at falsifian.org
Fri May 14 14:25:12 PDT 2021
On Fri, May 14, 2021 at 10:04:28AM -0900, Matthew Dillon wrote:
> Hi James. Yes, I think removing zalloc is still worth doing as a
> clean-up. Honestly I'd like to also get rid of objcache as well, replacing
> both zalloc and objcache with the new kmalloc_obj() facility. I don't
> think this would be a beginner project, though, but you could take a look
> at it and see if you think you can do it. I'll explain the complexity.
> For zalloc there is a chicken-and-egg problem. Currently three sub-systems
> still use zalloc:
> SWAPMETA - side-structures used to manage swap space
> PV ENTRY - side-structures used to manage page table pages
> MAP ENTRY - side structures used to manage virtual memory mappings
> The main reason these subsystems have not been converted is due to (A)
> bootstrapping in early-boot and (B) low-memory deadlock issues.
> For (A) an example is that bootstrapping the kernel VM system in early boot
> requires a number of PV entries and map entry structures, but obviously
> those cannot be allocated that early in the boot so they must be
> pre-reserved. zalloc() has a mechanism to do this. To implement this
> pre-reservation with kmalloc_obj() would require reserving at least one
> slab for each of the three zones and enhancing the kmalloc_obj() subsystem
> to utilize the reserve during early boot. This part isn't difficult but
> it does entail some significant programming.
> For (B) an example would be... when a normal kmalloc() needs to allocate a
> new block of memory it calls into the VM system which might also need to
> allocate a PV entry or a MAP entry... which again calls kmalloc() again and
> boom, low-memory deadlock. Similarly, if the pageout daemon is trying to
> page to swap due to a low-memory condition it may have to allocate
> swap-meta structures... but in a low-memory situation that can deadlock as
> Dealing with these issues requires reducing the number of side-structures
> that might need to be incidentally allocated in low-memory situations,
> meaning that if the zalloc subsystem is converted to kmalloc_obj(), then
> kmalloc_obj() needs to be enhanced to maintain a few extra slabs ready to
> go. I think implementing this enhancement to kmalloc_obj() would not be
> too hard.. basically when kmalloc_obj() needs to allocate a new slab it
> will check to see how many are on the free list and when there are not
> enough it will pro-actively do non-blocking allocations ahead of the
> So for (A) we would need a mechanism to register one or more full slabs
> with a zalloc_obj() zone, and for (B) the zalloc_obj() function itself
> would pro-actively always try to maintain X extra slabs on its free list
> (non blocking, so in a low memory situation it will try to maintain the
> additional zones but not block if it can't).
> So getting rid of zalloc is not trivial. zalloc deals with all of these
> situations and our current kmalloc_obj() facility does not. Nor does
> objcache really.
Thanks for taking the time to write that detailed answer. I hadn't seen
Is (A) really an issue for SWAPMETA? The current code in swap_pager.c
doesn't bother with zbootinit, and it looks like the code there isn't
run until fairly late in the boot process (when the pageout threads are
(B) sort-of occurred to me, but I didn't realize zalloc is specifically
designed to help with that, so I had assumed stuff like
vm_page_count_severe() ensured enough memory was free to avoid the
issue (for SWAPMETA; (A) would still be a problem for the other two
subsystems). I guess those parameters are not tuned to guarantee (B)
won't happen for SWAPMETA?
Assuming I'm right about (A), would it make sense to start by solving
(B), i.e. make kmalloc_obj maintain extra slabs? Then SWAPMETA could be
switched over before (A) gets solved for the other two subsystems.
Still assuming that's right, I think there's a reasonable chance I
could get SWAPMETA switched over, especially with a bit of
hand-holding. In theory your solution sounds straightforward, but I
have a lot of code to read and I'm sure I'll run into trouble
If someone else is planning to do it, they should probably go ahead.
Otherwise, my tentative next steps:
* Read (about) kmalloc_obj. I guess I'll start with the first commit
* Blindly switch swap_pager.c over to kmalloc_obj, and confirm the
deadlock issue (and see if there are any other surprises). Hopefully
I can make the deadlock reproducible.
* Implement your suggestion above to make the deadlock go away, using
generous parameters (allocate plenty of extra slabs).
* Do some careful analysis to figure out how many extra slabs are
actually needed, in order to fine-tune the implementation and be able
to argue convincingly that no deadlocks are possible.
More information about the Kernel