zalloc project

Fri May 14 14:25:12 PDT 2021

On Fri, May 14, 2021 at 10:04:28AM -0900, Matthew Dillon wrote:
> Hi James.   Yes, I think removing zalloc is still worth doing as a
> clean-up.  Honestly I'd like to also get rid of objcache as well, replacing
> both zalloc and objcache with the new kmalloc_obj() facility.  I don't
> think this would be a beginner project, though, but you could take a look
> at it and see if you think you can do it.  I'll explain the complexity.
> 
> For zalloc there is a chicken-and-egg problem.  Currently three sub-systems
> still use zalloc:
> 
> SWAPMETA   - side-structures used to manage swap space
> PV ENTRY - side-structures used to manage page table pages
> MAP ENTRY - side structures used to manage virtual memory mappings
> 
> The main reason these subsystems have not been converted is due to (A)
> bootstrapping in early-boot and (B) low-memory deadlock issues.
> 
> For (A) an example is that bootstrapping the kernel VM system in early boot
> requires a number of PV entries and map entry structures, but obviously
> those cannot be allocated that early in the boot so they must be
> pre-reserved.  zalloc() has a mechanism to do this.   To implement this
> pre-reservation with kmalloc_obj() would require reserving at least one
> slab for each of the three zones and enhancing the kmalloc_obj() subsystem
> to utilize the reserve during early boot.   This part isn't difficult but
> it does entail some significant programming.
> 
> For (B) an example would be... when a normal kmalloc() needs to allocate a
> new block of memory it calls into the VM system which might also need to
> allocate a PV entry or a MAP entry... which again calls kmalloc() again and
> boom, low-memory deadlock.  Similarly, if the pageout daemon is trying to
> page to swap due to a low-memory condition it may have to allocate
> swap-meta structures... but in a low-memory situation that can deadlock as
> well.
> 
> Dealing with these issues requires reducing the number of side-structures
> that might need to be incidentally allocated in low-memory situations,
> meaning that if the zalloc subsystem is converted to kmalloc_obj(), then
> kmalloc_obj() needs to be enhanced to maintain a few extra slabs ready to
> go.   I think implementing this enhancement to kmalloc_obj() would not be
> too hard.. basically when kmalloc_obj() needs to allocate a new slab it
> will check to see how many are on the free list and when there are not
> enough it will pro-actively do non-blocking allocations ahead of the
> demand.
> 
> So for (A) we would need a mechanism to register one or more full slabs
> with a zalloc_obj() zone, and for (B) the zalloc_obj() function itself
> would pro-actively always try to maintain X extra slabs on its free list
> (non blocking, so in a low memory situation it will try to maintain the
> additional zones but not block if it can't).
> 
> So getting rid of zalloc is not trivial.  zalloc deals with all of these
> situations and our current kmalloc_obj() facility does not.  Nor does
> objcache really.
> 
> -Matt

Thanks for taking the time to write that detailed answer. I hadn't seen
kmalloc_obj.

Is (A) really an issue for SWAPMETA? The current code in swap_pager.c
doesn't bother with zbootinit, and it looks like the code there isn't
run until fairly late in the boot process (when the pageout threads are
created).

(B) sort-of occurred to me, but I didn't realize zalloc is specifically
designed to help with that, so I had assumed stuff like
vm_page_count_severe() ensured enough memory was free to avoid the
issue (for SWAPMETA; (A) would still be a problem for the other two
subsystems). I guess those parameters are not tuned to guarantee (B)
won't happen for SWAPMETA?

Assuming I'm right about (A), would it make sense to start by solving
(B), i.e. make kmalloc_obj maintain extra slabs? Then SWAPMETA could be
switched over before (A) gets solved for the other two subsystems.

Still assuming that's right, I think there's a reasonable chance I
could get SWAPMETA switched over, especially with a bit of
hand-holding. In theory your solution sounds straightforward, but I
have a lot of code to read and I'm sure I'll run into trouble
somewhere.

If someone else is planning to do it, they should probably go ahead.
Otherwise, my tentative next steps:

* Read (about) kmalloc_obj. I guess I'll start with the first commit
  message (e9dbfea1).

* Blindly switch swap_pager.c over to kmalloc_obj, and confirm the
  deadlock issue (and see if there are any other surprises). Hopefully
  I can make the deadlock reproducible.

* Implement your suggestion above to make the deadlock go away, using
  generous parameters (allocate plenty of extra slabs).

* Do some careful analysis to figure out how many extra slabs are
  actually needed, in order to fine-tune the implementation and be able
  to argue convincingly that no deadlocks are possible.

-- 
James