Cache coherency, clustering, and Kernel virtualization
Matthew Dillon
dillon at apollo.backplane.com
Sat Sep 2 20:37:03 PDT 2006
:Just out of curiousity, how do you intend to achieve this goal? My
:uneducated
:mind leans towards a solution where all page faults are trapped in the
:actual
:kernel and a lookup table is used to pass the faults upto the actual virtual
:
:kernel hosting the process.
:
:(I'm not contending it can't be done, I'm just curious as to how it will be
:done).
That's basically how it works. Lets take a uni-processor virtual kernel
just to keep things simple. The virtual kernel runs as a single user
process, say pid 5101. There is a VMSPACE structure in the real kernel
associated with pid 5101 that represents the virtual kernel's address
space.
Each of the user processes operating under the virtual kernel are
also represented by a VMSPACE in the real kernel. That is, since
the virtual kernel cannot directly manipulate the memory map, it
has to issue system calls to the real kernel to manipulate the
memory map for each of it's user process contexts. So a virtual
kernel might be manipulate, say, 200 real-kernel VMSPACE's
representing 200 user processes running under that virtual kernel.
A VMSPACE is actually a very simple structure in real life. It contains
the vm_map for the user process and the page table and that's basically
it. For example, we use VMSPACE's to 'snapshot' resident programs
(man resident).
From the point of view of the real kernel, there is only *ONE* process,
with only one *ACTIVE* VMSPACE. The virtual kernel's VMSPACE is the
one that runs when the virtual kernel is running. When the virtual
kernel wants to pass control to one of its user processes, it simply
tells the real kernel to swap its VMSPACE with one of the 200 VMSPACE's
it is managing which represent the user process who's context it wants
to pass control to... all in the same real-kernel process (5101). This
is almost exactly equivalent to how the real kernel passes control
to a user process.
Remember we have just one cpu here, and the virtual kernel is responsible
for scheduling its own processes. Only one cpu context can actually be
running at a time anyway so there's no point abstracting the virtual
kernel's processes out to the real kernel. From the point of view of the
real kernel there is only one process, period.
The real kernel handles page faults, traps, system calls, and signals
differently when they occur while one of the alternate VMSPACE's is
running. If a page fault, trap, system call, or signal occurs from one
of the alternate VMSPACE's, the real kernel simply swaps the virtual
kernel's VMSPACE back in and passes the information to it rather then
trying to process it itself. So the real kernel's interaction with the
virtual kernel's processes is very minimal. It amounts only to swapping
VMSPACE's and passing the data related to faults, traps, system calls,
and signals to the virtual kernel.
This is what I mean by 'chaining page faults through the virtual kernel'.
Because the real kernel manages the VMSPACE structures for the virtual
kernel (since the virtual kernel is a user process and cannot manipulate
the MMU itself), the real kernel can wind up removing any page mapping
in those VMSPACE's at any time. If the related address is accessed,
the real kernel passes the fault on to the virtual kernel and it is the
virtual kernel's job to tell the real kernel how to remap the page in
that VMSPACE.
--
In an SMP system the virtual kernel (e.g. process 5101 in our example)
simply rfork()'s itself, so now we might have process 5101 and process
5102, each representing one cpu in a two-cpu virtual system. Everything
else operates the same but of course now we have two 'cpus' and two
real-kernel process contexts so we have the ability to pass control
to two of the virtual kernel's user processes at the same time.
-Matt
More information about the Kernel
mailing list