[GSOC] Implement hardware nested page table support for vkernels

Mihai Carabas mihai.carabas at gmail.com
Mon Jul 29 04:44:01 PDT 2013


> I have some problems with the stack mapping (I get some wierd page-faults
> at address 0 when accessing the stack - I missed something about the stack
> growing I guess). I will investigate this issue in order to go further and
> run the vkernel process in the GUEST context.
The problem was not the stack. I've introduce a silly bug in my ASM code
where the RDI (which is the first parameter in x86-64 calling convention)
and R11 were saved in the same memory location. Thus, the RDI was
overwritten with a bogus value at restoring.

After solving the issue above, any simple programs were running ok under
the VMX non-root context. I started modifying the vkernel code in order to
run in the VMX non-root context. First of all the vkernel needed the cpuid
instruction to be emulated (this was straight forward to implement). Also I
had to intercept the set_tls_area syscall in order to configure the %gs
base and the %fs as needed. Than several issues rised up with signal
handling syscalls which modified the RIP at a custom one (if you remember
from my last e-mail I had to increment the RIP with the size of the syscall
ASM instruction, in order to pass over it - if the RIP got changed I
shouldn't have been modifying it...modify when returning to the original
syscall ASM instruction). With these issue I lost two days because at
first, the fault returned by the debugger was indicanting something
regarding that %fs base is 0. I started investingating this issue (it
wasn't easy, because I couldn't read the %fs base from userspace....but
with the help of vsrinivas and dillon I managed to do this). Anyway, after
observing that the %fs was ok, I started printing out RIP/RSP of the VM
context and then disassmble the binaries to see at what instructions the
code started to act abnormally. This guided me to the signal handling
syscalls which were modifying RIP. More to say, while handling a signal,
before calling sigreturn, another signal could be raised up and treated
before calling sigreturn.

Going further, some faults appeared due to bogus %fs/%gs base. This was
because not only the set_tls_area syscall was modifying the base addresses.
Also vmspace_ctl was calling the syscall function directly from the kernel.
To solve this, I removed the hook from my VMM module for set_tls_area
syscall and introduce it in the code of the system call. At this point the
init process started, but the /bin/sh process created by init was killed
with a signal 12 (undefined syscall). This was due to the fact that I don't
verify for the moment the instruction opcode when I get a fault of UD
(Undefined instruction). I assume that is syscall for sure. I investigated
my VMM logs and found the RIP that was causing this. Then I disassamble
again the binaries and saw that the "cvttsd2si" asm instruction was
executed. Checked the manual and saw that I didn't enable the CR4_FXSR and
CR4_XMM, causing my UD fault.

At this point I have a single-core vkernel running in VMX non-root context,
without sendmail. The sendmail is throwing an UD fault. I will investigate
today and see what instruction is missing. Also I will implement the check
for UD instruction (if it is "syscall" opcode or anything else). Another
thing is modifying the vkernel a bit further in order to be able to run
with multiple cores.

Also I need to study to see if the cothreads (the ones that handle i/o)
needs to run in the VMX non-root context when I will start to implement the
EPT. Another thing to investigate if the migration of the VMX thread from
one CPU to another is handled correctly (I managed to see some failures,
but they weren't reproduceable).

That is for now. Will keep you in touch with any new progress as I get to

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dragonflybsd.org/pipermail/kernel/attachments/20130729/911c7f14/attachment.html>

More information about the Kernel mailing list