full/empty terminology in _malloc.h
James Cook
falsifian at falsifian.org
Wed May 19 06:26:24 PDT 2021
On Tue, May 18, 2021 at 04:50:36PM -0900, Matthew Dillon wrote:
> Now onto malloc_mgt_poll_empty_locked(). This code should strictly keep
> only empty magazines on the empty list and full magazines on the full
> list. Any partial magazines will be moved to the partial list. The order
> on the empty and full lists shouldn't matter at all.
>
> The partial list is more problematic. For that we dive into _kmalloc_obj()
> itself. It will allocate out of mgt->active first, then mgt->alternate.
> If both are empty (have no objects available to allocate), then it pulls a
> slab off of the (per-zone) global ggm->partial list, then ggm->full list,
> then checks a few ggm->empty elements to see if any happen to have any
> objects in them (the poller might not have moved them to the appropriate
> list).
>
> This is the part where we still have a fragmentation issue. The
> ggm->partial list is not sorted in any way and it would probably be best if
> we took the 'most empty' of the slabs from the partial list to make the new
> active instead of just taking the first one. Scanning the whole partial
> list would be expensive, but scanning a couple of slabs off of the partial
> list might be beneficial.
> -Matt
An idea and a bunch of questions.
Idea:
In order to quickly find the approximately-most-empty partial bucket,
how about replacing the "partial" field with an array of "partial"
buckets, sorted from most empty to must full? E.g.
- partial[0]: 0-20% full (but no empty slabs)
- partial[1]: 10-30% full
- partial[2]: 20-40% full
etc.
struct kmalloc_slab *partial[NUM_BUCKETS]; /* global */
A slab could end up in the wrong bucket as _kfree_obj fills it (unless
you're willing to grab a lock in _kfree_obj) but that could be worked
around as follows. Replace the "if (ggm->partial) {...}" block in
_kmalloc_obj with something like the following:
moves = 0;
for (i = 0; i < NUM_BUCKETS; ++i) {
while (ggm->partial[i] && (*(ggm->partial[i]) is in the wrong bucket)) {
if (++moves > MAX_MOVES_PER_ALLOC)
break; /* limit moves per alloc */
tmp = ggm->partial[i];
ggm->partial[i] = ggm->partial[i]->next;
tmp->next = ggm->partial[(correct bucket)];
ggm->partial[(correct bucket)] = tmp;
}
if (ggm->partial[i])
break;
}
if (i < NUM_BUCKETS)
(use ggm->partial[i] as the new active slab)
It shouldn't take too long: NUM_BUCKETS and MAX_MOVES_PER_ALLOC limit
the work done.
One nice property: we can do a worst-case analysis of the number of
times this line gets reached:
break; /* limit moves per alloc */
Specifically, I think this is true:
W - F <= P * NUM_BUCKETS / MAX_MOVES_PER_ALLOC
where
W = # times that line is reached
P = # times a slab gets moved to partial
F = # times a slab gets moved to full
In other words: either W is small (you usually choose the right partial
bucket) or F is big (lots of slabs can be freed back to the VM). Either
way, a win against fragmentation.
Questions:
- How do you measure fragmentation? Do the MemUse vs. SlabUse columns
in vmstat -m tell you space used by in-use objects vs. total space
used by slabs?
- Do you have a way of measuring whether the passive polling
(malloc_mgt_poll_empty_locked) is working well? E.g. not missing too
many opportunities to re-use a slab.
Why did you choose to use polling, instead of taking care of it in
_kfree_obj directly? Is it to avoid having to lock the global mgt
structure?
- What's the difference between atomic_fetchadd_long and
atomic_add_long? (Both used in _kfree_obj.)
- Is a call to cc_sfence() or something needed between these two lines
in _kfree_obj?
slab->fobjs[i] = obj;
atomic_add_long(&slab->xindex, 1); /* synchronizer */
If the stores get re-ordered, another thread might try to read the
new fobjs entry before it's ready. Or does atomic_add_long ensure
ordering? (I have pretty much zero experience with concurrent
programming.)
--
James
More information about the Kernel
mailing list