On SMP

Sat Jan 22 16:46:00 PST 2005

:Hi Matt,
:
:based on this post and the one about X.org running with threads, there is
:something I don't understand.
:
:DFly has light-weight kernel threads, right?  So more that one thread
:(process?) can be in the kernel at once, right?
:
:What is the issue with userland threads, pthreads, libc_r, and so on?  What
:is the 1:1 threading library you mentioned?
:
:Sorry for the naive questions, but I guess I've never harmonized my generic
:knowledge of threads (Win32, python, Java) with POSIX/*nix/(DFly)BSD
:threading primitives.
:
:Jonathon

    Its a good question.  I'll try to give a summary.

    First, don't get confused between user threads and kernel threads.
    They are whole different entities.  The fact that DragonFly has 
    kernel threads and kernel processes does not help us deal with
    the userland threading problem much at all.

    There are two major facets involved with userland threading:

    (1) The concept of a user 'thread'.  Every user thread has an execution
	context and its own stack, and other things.

    (2) The concept of a kernel context, used when the userland thread
	performs a system call.  A kernel context needs its own stack.

    Now, in a non-threaded program there is only one user 'thread' and
    only one kernel context (the kernel process).  Its ok for the 
    one user thread to make a system call which might block in the kernel
    because, well, there is only one thread anyway.

    In a multi-threaded userland program you have M user threads rather
    then just one.  

    The problem the threading library faces is how to handle the case 
    when the kernel blocks.  If a threading library is trying to manage,
    say, 100 userland threads with only one kernel context, then it cannot
    allow the kernel to block on any system call because that would wind
    up blocking all 100 threads instead of just the one that made the system
    call.

    There are many ways to solve the problem, here are four of them:

    * The threading library implements a M:1 model.  This is the 'select'
      or 'kqueue' method of dealing with blocking conditions.  The threading
      library is written so as to never make a system call which might block.
      It uses select() and non-blocking I/O to determine when one of its
      user threads needs to 'block', without actually blocking in the
      kernel.  This was how the original BSD threading library worked.

      This model can also be used to implement M:N, at least to a degree.

      The problem with this model is that most I/O related system calls that
      might be just one system call in a normal program wind up being three
      or four using this model.  The overhead can get nasty.  Plus there are
      other problems... since there is only one real kernel process POSIX
      signal sharing requirements can cause problems.

    * The threading library implements a 1:1 model.  That is, for every
      user thread created the threading library rfork()'s a new process.
      that way any given thread can 'block' in the kernel without blocking
      the other threads.  This is basically the model that linux uses.

    * The threading library implements a M:N model where M is the number of
      user threads and N is a dynamic number of kernel contexts. There is 
      still only one process but temporary kernel contexts are created
      whenever a system call might block.  When the system call blocks,
      the kernel performs an 'upcall' to userland to allow it to continue
      running with a new kernel context while the other one is blocked.

      This is the KSE model that FreeBSD implements.

    * The threading library implements a M:NCPUS model where M is the
      number of user threads and NCPUS is the number of cpus in the system.
      The library creates a kernel context for each cpu and manages any
      number of threads using those fixed number of contexts.  This requires
      asynchronous system call support -- The syscall messaging that we 
      are slowly implementing in DragonFly.

      This is theoretically the most efficient threading model, baring
      discussions on messaged syscall overhead.  This is the model I would
      eventually like to be using in DragonFly.

    There are other models.

    In anycase, the core problem with all of these models is how to handle
    the case where a system call blocks.

    In the M:1 kqueue/select model the thread library tries to maintain 
    total control in order to avoid the case where the kernel might block
    unexpectedly on it. 

    In the 1:1 model the threading library doesn't try to control blockages,
    it just lets them happen and gives each user thread a kernel context so
    it can block without effecting other threads.

    In the M:N model where N is dynamic... the KSE model, the kernel tries to
    return control to userland when it would otherwise block to allow userland
    to run another thread.

    In the M:NCPUS messaged syscall model, the kernel provides an asynchronous
    system call interface so the userland threading library does not have
    to do any fancy workarounds to maintain control.

    The biggest problem we face in dealing with these models is simply the
    fact that the kernel codebase was not originally designed with massive
    resource sharing or multiple-contexts-referencing-the-same-process
    at the same time.  For example, a typical system call mostly assumes
    that elements of the process structure will not be ripped out from
    under it in the middle of the system call if it happens to block.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>