Cache coherency, clustering, and Kernel virtualization

Pieter Dumon pieter.dumon at gmail.com
Sun Sep 3 04:13:36 PDT 2006


Hi,
I'm not fully getting this I'm afraid. So, probably a stupid question:
In the end it will be possible to cluster the real kernels, right ? Or
are you saying you will always use userland kernels to partition
resources and "donate" resources to a cluster ?
(The idea of userland kernels is really attractive as a development
tool and equally in terms of partitioning resources of course).
Pieter

On 9/3/06, Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx> wrote:
:Just out of curiousity, how do you intend to achieve this goal?  My
:uneducated
:mind leans towards a solution where all page faults are trapped in the
:actual
:kernel and a lookup table is used to pass the faults upto the actual virtual
:
:kernel hosting the process.
:
:(I'm not contending it can't be done, I'm just curious as to how it will be
:done).
    That's basically how it works.  Lets take a uni-processor virtual kernel
    just to keep things simple.  The virtual kernel runs as a single user
    process, say pid 5101.  There is a VMSPACE structure in the real kernel
    associated with pid 5101 that represents the virtual kernel's address
    space.
    Each of the user processes operating under the virtual kernel are
    also represented by a VMSPACE in the real kernel.   That is, since
    the virtual kernel cannot directly manipulate the memory map, it
    has to issue system calls to the real kernel to manipulate the
    memory map for each of it's user process contexts.  So a virtual
    kernel might be manipulate, say, 200 real-kernel VMSPACE's
    representing 200 user processes running under that virtual kernel.
    A VMSPACE is actually a very simple structure in real life.  It contains
    the vm_map for the user process and the page table and that's basically
    it.  For example, we use VMSPACE's to 'snapshot' resident programs
    (man resident).
    From the point of view of the real kernel, there is only *ONE* process,
    with only one *ACTIVE* VMSPACE.  The virtual kernel's VMSPACE is the
    one that runs when the virtual kernel is running.  When the virtual
    kernel wants to pass control to one of its user processes, it simply
    tells the real kernel to swap its VMSPACE with one of the 200 VMSPACE's
    it is managing which represent the user process who's context it wants
    to pass control to... all in the same real-kernel process (5101).  This
    is almost exactly equivalent to how the real kernel passes control
    to a user process.
    Remember we have just one cpu here, and the virtual kernel is responsible
    for scheduling its own processes.  Only one cpu context can actually be
    running at a time anyway so there's no point abstracting the virtual
    kernel's processes out to the real kernel.  From the point of view of the
    real kernel there is only one process, period.
    The real kernel handles page faults, traps, system calls, and signals
    differently when they occur while one of the alternate VMSPACE's is
    running.  If a page fault, trap, system call, or signal occurs from one
    of the alternate VMSPACE's, the real kernel simply swaps the virtual
    kernel's VMSPACE back in and passes the information to it rather then
    trying to process it itself.  So the real kernel's interaction with the
    virtual kernel's processes is very minimal.  It amounts only to swapping
    VMSPACE's and passing the data related to faults, traps, system calls,
    and signals to the virtual kernel.
    This is what I mean by 'chaining page faults through the virtual kernel'.
    Because the real kernel manages the VMSPACE structures for the virtual
    kernel (since the virtual kernel is a user process and cannot manipulate
    the MMU itself), the real kernel can wind up removing any page mapping
    in those VMSPACE's at any time.  If the related address is accessed,
    the real kernel passes the fault on to the virtual kernel and it is the
    virtual kernel's job to tell the real kernel how to remap the page in
    that VMSPACE.
    --

    In an SMP system the virtual kernel (e.g. process 5101 in our example)
    simply rfork()'s itself, so now we might have process 5101 and process
    5102, each representing one cpu in a two-cpu virtual system.  Everything
    else operates the same but of course now we have two 'cpus' and two
    real-kernel process contexts so we have the ability to pass control
    to two of the virtual kernel's user processes at the same time.
                                                -Matt







More information about the Kernel mailing list