lkwt in DragonFly

Mon Feb 9 01:31:24 PST 2004

:    Anyway, so for the UMA _keg_ extensions, since there's no
:    interlocking, the replacement was next to trivial and the amount of
:    code within the critical section was minimal (all you do is check
:    the keg and if non-empty, allocate from it or exit the
:    critical section and do something else).  And it is precisely with
:    this change that I noticed a slight pessimization.  So either I
:    grossly underestimated the number of interrupts that occur on
:    average while in that critical section or the cost of
:    entering/exiting the critical section is at least as high as that of
:    grabbing/dropping a mutex.  Again, this may not be the case for
:    DragonFly. [Ed.: And now that I've read what follows, I predict it
:    likely isn't]

    Ok, I understand what you have done.  I've looked at the FreeBSD-5
    code and from what I can tell my optimized critical section code
    was either never committed or it was backed out.  FreeBSD-5 seems
    to be doing a lot of sti/cli junk and that is going to be far worse
    then obtaining or releasing a mutex.  DFly does not have that issue.

    What you are doing in UMA we tend to do is discrete APIs which 
    operate on the globaldata structure.  e.g. I cache struct thread's
    within a discrete API, for example, and do not use a generalized
    (UMA-like) infrastructure that could cover multiple data types. 
    I have found that a discrete implementation for simple typed data
    structure caches like these are far more understandable and far easier
    to code and maintain and we will probably do something similar for
    the mbuf and mbuf+cluster allocations in DFly.  Our slab allocator
    is also per-cpu, but I have not attempted to (nor do I believe it is 
    a good idea to) try to integrate type-stable caching within the 
    slab allocator's API abstraction.  The reason is simply that differently
    typed structures have different requirements that cannot be optimally
    met with a single, general, CTOR/DTOR style API.

:>     The DragonFly slab allocator does not need to use a mutex or token
:>     at all and only uses a critical section for very, very short periods 
:>     of time within the code.  I have suggested that Jeff recode UMA to 
:>     remove the UMA per-cpu mutex on several occassions.
:
:    I have been in touch with Jeff regarding UMA issues for a long while
:    and he has mentionned that he did exactly that, several months ago.
:    However, I'm not sure exactly what preempted that work from going in.
:    It's very possible that the interlocking issues involving being in
:    the critical section and having to grab the zone lock in the case
:    where the pcpu cache was empty remained unresolved.  Also, since I
:    did something similar (and simpler) and noticed a pessimisation,
:    actual performance would have to be evaluated prior to making a
:    change like that - and perhaps it was only to find that performance
:    was worse.

    Dealing with cache exhaustion is definitely an issue but I do not see
    the issue as being much different from how the mbuf cluster code worked
    in the first place.

:    If you are already in a critical section, the cost is negligeable.
:    If you are not, which is ALWAYS when it comes to the UMA keg code,
:    then you always disable interrupts.  I remember you a while back
:    committing changes that made the general critical section enter and
:    exit faster in the common case, deferring the cli to the scenario
:    where an interrupt actually occurs.  I don't remember the details
:    behind the backout.  I guess I'd have to dig up the archives.

    John seems to want to insist on a complex, machine-specific critical
    section API.  IMHO It's a mistake.  By not guarenteeing fast critical
    section code people coding for FreeBSD-5 are basically left with only
    one 'optimized' locking interface... that being the mutex interface,
    and wind up going through massive convolutions to use mutexes for things
    that mutexes should not really be necessary for.

:    Perhaps CPU migration should not be permitted as a side-effect of
:    being pre-empted within the kernel, then we can consider similar
:    optimisations in FreeBSD 5.x.  Prior to that, however, I wonder what
:    measurable gains there are from allowing full-blown pre-emption with
:    CPU migration within the kernel, if any.  I'll assume for the moment
:    that there is a reasonable rationale behind that design decision.
:
:--
:Bosko Milekic  *  bmilekic at xxxxxxxxxxxxxxxx  *  bmilekic at xxxxxxxxxxx
:TECHNOkRATIS Consulting Services  *  http://www.technokratis.com/

    You will never see any statistical benchmark gains from allowing
    non-interrupt preemption.  Never, ever.  Preemption is, by definition,
    interrupting work that the cpu must do anyway with more work that the
    cpu must do anyway.  Without preemption the mainline thread work winds
    up being serialized which will actually be MORE optimal then allowing
    the preemption since there is less L1 cache pollution and the thread
    work in question is usually short-lived anyhow.  Any embedded systems
    programmer will tell you the same thing and the concepts apply to
    both DFly and FreeBSD big-time.

    The only thing you get from kernel preemption is potentially better
    responsiveness to interrupts when they block on something.  That's it,
    nothing else.... and, frankly, FreeBSD-5's priority borrowing is nothing
    more then one huge hack which tries to work around the truely horrible
    coding practices being used with mutexes from interrupts.  This is one
    reason why I prefer discrete APIs for specialized operations such as
    mbuf allocation which might have to be done from an interrupt.  Trying
    to squeeze everything into one generalized API makes it impossible to
    taylor a general API for interrupt or non-interrupt use without adding
    some severe hacks (like priority borrowing).

    Keep in mind that there are two kinds of preemption that we are talking
    about here.  DragonFly allows interrupt threads to preempt mainline
    kernel threads, just like FreeBSD.  What DragonFly does not allow
    (and FreeBSD does allow) is for an interrupt thread to switch to
    a non-interrupt thread before returning to the originally interrupted
    non-interrupt thread.  So in DFly preemption is a well understand 
    interface that allows us to make certain assumptions about what kinds
    of things can preempt mainline kernel code, while in FreeBSD preemption
    is 'wild'... anything can happen at any time, making it impossible to
    have assumptions that could otherwise be used to optimize performance.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>