DragonFlyBSD threading model

Mon Aug 1 08:17:34 PDT 2005

:Hi,
:
:I have been reading up about how the different BSD systems are
:handling SMP and threading. I gather NetBSD uses scheduler
:activations, where userland programs are given virtual CPUs which then
:get mapped to kernel threads for execution (please correct me if my
:analysis is wrong). FreeBSD appears to use a variant of this idea
:called KSEs. I believe this is known as M:N.
:
:I am not however sure exactly how DragonFlyBSD threads work - each CPU
:has an LWKT scheduler, and that above this layer there is also a
:userland scheduler (partially equivalent to the functionality provided
:by sched_4bsd.c under FreeBSD?) - but my question is how that relates
:to userland programs - when they have threads..does each program
:thread map directly to one of the LWKT schedulers on one of the CPUs?
:Does that make it 1:1?
:
:I apologise if this seems like an extremly novide question, this stuff
:is all very new to me.
:
:Thanks in advance, Alex J Burke.

    Basically everything gets mapped onto a LWKT (Light Weight Kernel Thread),
    including userland processes.

    The LWKT scheduler is a fixed priority scheduler.  For example, if you 
    do a 'ps ax' you will see a ton of kernel threads in addition to userland,
    including threads for handling interrupts.  Priorities work something
    like this:

    HIGHEST
	Interrupt threads
	Soft interrupt threads
	Kernel threads (pageout demon, network protocol stacks, etc)
	User Threads running in the kernel on behalf of userland
	User Threads running in userland
	LWKT scheduler helper thread (helps with cross-cpu scheduling)
	Idle thread
    LOWEST

    The LWKT scheduler implements minimal preemption for interrupt threads
    and will round-robin at the same priority level, but basically the
    highest priority runnable thread is the one that gets run.

    The userland scheduler is responsible for handling user threads only
    while they are running in (or about to be return to) userland.  The
    userland scheduler will only schedule one thread at a time per cpu on
    behalf of userland.

    So, for example, if there are 100 user processes all ready to run, the
    userland scheduler will only schedule one of those (per cpu) onto the 
    LWKT scheduler.

    If there are 100 user processes all ready to run and 10 of them are 
    in the kernel, those 10 are directly scheduled via LWKT and the 
    userland scheduler only deals with the remaining 90, choosing one at 
    a time (per cpu) to schedule via LWKT.  The userland scheduler is
    preemptive, meaning that it can choose to deschedule one userland
    process and reschedule another to LWKT at any time.

    We don't have a priority inversion problem for e.g. idle vs normal vs
    'realtime' (which isn't real realtime) userland processes because the
    userland scheduler does not deal with threads once they have entered
    the kernel.  That's why there are two priorities associated with
    user threads.

    This separation also allows us to theoretically implement multiple
    userland schedulers, either exclusively or in tandem, or even locked
    to a subset of available cpus.  These are features I want to add.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>