full/empty terminology in _malloc.h

Wed May 19 06:26:24 PDT 2021

On Tue, May 18, 2021 at 04:50:36PM -0900, Matthew Dillon wrote:
> Now onto malloc_mgt_poll_empty_locked().  This code should strictly keep
> only empty magazines on the empty list and full magazines on the full
> list.  Any partial magazines will be moved to the partial list.  The order
> on the empty and full lists shouldn't matter at all.
> 
> The partial list is more problematic.  For that we dive into _kmalloc_obj()
> itself.  It will allocate out of mgt->active first, then mgt->alternate.
> If both are empty (have no objects available to allocate), then it pulls a
> slab off of the (per-zone) global ggm->partial list, then ggm->full list,
> then checks a few ggm->empty elements to see if any happen to have any
> objects in them (the poller might not have moved them to the appropriate
> list).
> 
> This is the part where we still have a fragmentation issue.  The
> ggm->partial list is not sorted in any way and it would probably be best if
> we took the 'most empty' of the slabs from the partial list to make the new
> active instead of just taking the first one.  Scanning the whole partial
> list would be expensive, but scanning a couple of slabs off of the partial
> list might be beneficial.
> -Matt

An idea and a bunch of questions.

Idea:

In order to quickly find the approximately-most-empty partial bucket,
how about replacing the "partial" field with an array of "partial"
buckets, sorted from most empty to must full? E.g.

- partial[0]:  0-20% full (but no empty slabs)
- partial[1]: 10-30% full
- partial[2]: 20-40% full

etc.

	struct kmalloc_slab	*partial[NUM_BUCKETS];	/* global */

A slab could end up in the wrong bucket as _kfree_obj fills it (unless
you're willing to grab a lock in _kfree_obj) but that could be worked
around as follows. Replace the "if (ggm->partial) {...}" block in
_kmalloc_obj with something like the following:

	moves = 0;
	for (i = 0; i < NUM_BUCKETS; ++i) {
		while (ggm->partial[i] && (*(ggm->partial[i]) is in the wrong bucket)) {
			if (++moves > MAX_MOVES_PER_ALLOC)
				break;  /* limit moves per alloc */
			tmp = ggm->partial[i];
			ggm->partial[i] = ggm->partial[i]->next;
			tmp->next = ggm->partial[(correct bucket)];
			ggm->partial[(correct bucket)] = tmp;
		}
		if (ggm->partial[i])
			break;
	}
	if (i < NUM_BUCKETS)
		(use ggm->partial[i] as the new active slab)

It shouldn't take too long: NUM_BUCKETS and MAX_MOVES_PER_ALLOC limit
the work done.

One nice property: we can do a worst-case analysis of the number of
times this line gets reached:

	break;  /* limit moves per alloc */

Specifically, I think this is true:

  W - F <= P * NUM_BUCKETS / MAX_MOVES_PER_ALLOC

  where

  W = # times that line is reached
  P = # times a slab gets moved to partial
  F = # times a slab gets moved to full

In other words: either W is small (you usually choose the right partial
bucket) or F is big (lots of slabs can be freed back to the VM). Either
way, a win against fragmentation.

Questions:

- How do you measure fragmentation? Do the MemUse vs. SlabUse columns
  in vmstat -m tell you space used by in-use objects vs. total space
  used by slabs?

- Do you have a way of measuring whether the passive polling
  (malloc_mgt_poll_empty_locked) is working well? E.g. not missing too
  many opportunities to re-use a slab.

  Why did you choose to use polling, instead of taking care of it in
  _kfree_obj directly? Is it to avoid having to lock the global mgt
  structure?

- What's the difference between atomic_fetchadd_long and
  atomic_add_long? (Both used in _kfree_obj.)

- Is a call to cc_sfence() or something needed between these two lines
  in _kfree_obj?

	slab->fobjs[i] = obj;
	atomic_add_long(&slab->xindex, 1);	/* synchronizer */

  If the stores get re-ordered, another thread might try to read the
  new fobjs entry before it's ready. Or does atomic_add_long ensure
  ordering? (I have pretty much zero experience with concurrent
  programming.)

-- 
James