Userland threading and messaging

Mon Aug 18 09:00:56 PDT 2003

:> most if not all syscalls / messages asynchronous. The scheduling
:> and esp. the handling of different userland threads needs some
:> context switching which could be copied from the kernel. The
:> signal code could be reused as upcall mechanism to notify the
:> process about POSIX signals, finished messages or want ever we
:> want. That's all fine.
:>
:
:What about thread-to-processor binding?  For compute-bound programs it is 
:very nice to be able to explicitly stick threads onto different CPUs.  
:For that there would need to be some kind of kernel code at least.
:
:[snip]
:>
:> Joerg
:
:-wd
:-- 
:chip norkus; renaissance hacker;                wd at xxxxxxxx

    There are lots of solutions here.  Since we are virtualizing
    cpus by rfork()ing, it comes down to creating a model whereby
    a process can be bound to a subset cpus.

    Ultimately this means partitioning the userland scheduler.
    Instead of having one userland scheduler which all processes
    are bound to we would have N userland schedulers and
    scheduling entities could be created and destroyed on the fly.

    Right now our userland scheduler integrates a realtime queue,
    idletime queue, and normal queue.  We would want to rip all
    that out and have 'each' userland scheduler implement only
    one queue but then give it a different LWKT priority, so:

    usched IDLE on cpu* lwktpri 4
    usched NORM on cpu* lwktpri 6
    usched REAL on cpu* lwktpri 8
    usched MYCUSTOM on cpu1 cpu2 lwktpri 4 locked
    ...

    A threaded program would be able to choose a general userland
    scheduler, like 'MYCUSTOM', and simple rfork() away and let the
    kernel deal with the cpu assignments.  A 'locked' userland
    scheduler would assign the rforks to specific cpus and leave
    them there, an 'unlocked' userland scheduler would be free to
    shift processes around between the cpus under its wing.

    I think it is fairly important to maintain a distinction
    between physical cpus and logical (virtualized) cpus.  Our virtualized
    cpu abstraction is still simply doing an rfork().  The real cpus all
    those 'virtualized' cpus get assigned to would depend on the scheduler
    specified.  It would also be nice if the schedulers were recursive, so
    an administrator could create a 'virtual' 4-cpu scheduler that is 
    assigned to a pool of 8 cpus (or an 8-cpu scheduler that is assigned
    to a poll 4 cpus) and hand that entity off to the userland which would
    then be able to subdivide it with further usched commands.

    This is just brainstorming on my part... the only thing I know for
    certain is that I want to retain the rfork() model for virtualizing
    cpus.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>