[GSOC] Implement hardware nested page table support for vkernels

Mihai Carabas mihai.carabas at gmail.com
Sun Aug 25 13:45:37 PDT 2013


This week I've been looking over the pmap code and compare the generic x86
PG_* bits with the EPT ones and try to find a common API between those.
Also Neel Natu pointed out his work on FreeBSD about EPT in pmap which
helped me out find more easily the common API.

I have rewritten the pmap code adding support for custom PG_* bits by
adding in the "struct pmap {" a field for them (pmap->pmap_bits). A
particular pool of these bits are the cache ones. I've added the
pte_pat_index vector in the pmap also with the name
"pmap->pmap_cache_bits". This vector it's indexed with the type of caching,
an entry containing the bits to be set in the PTE (page table entry) in
order to obtain that caching type. I've also added a field
"pmap->pmap_cache_mask" which identify all the cache bits in a PTE. I saw
that in Dfly we don't use the PDE cache bits in order to map bigger pages
(>4k). This removed the need for having multiple masks, depending on the
type of the page sizes we map.

In FreeBSD they are using PDE for bigger pages, so they having multiple
cases (the PAT corresponding to a PDE is different than the PAT
corresponding to a PTE).

I've removed all the PG_* bits and replace them with the pmap private
members. Seems that the system still works.

I've also build the EPT bits like this:
- PG_V - valid (EPT_READ and EPT_EXECUTE bits are set) - on the normal map
we didn't have these, we had only the VALID bit
- PG_RW - read/write (EPT_READ and EPT_EXECUTE and EPT_WRITE)
- PG_U - don't have a correspondent (0)
- PG_A - EPT_A (bit8)
- PG_M - EPT_M (bit9)
- PG_G - don't have a correspondent (0)
- PG_AVAIL1 - moved from bit9 to bit10
- PG_AVAIL2 - moved from bit10 to bit11
- PG_AVAIL3 - moved from bit11 to bit52 (the first free)
- bit3-bit6 controls the cache type. I've made an array like pte_pat_index
and each cache type has the right bits in place

Now we have a vkernel with a vmspace (including the pmap) in the regular
format. I've build a function that converts this table to EPT format (only
the USERSPACE side), but I'm loosing some cache info's (the EPT PDE has to
have the cache bits to 0) and I can't revert it back. One idea is to
vmspace_fork which creates a new vmspace with the same vm_maps and an empty
pmap. Replace the private members with the EPT ones. The pmap will be
populated at each pagefault with the new bits. It would be great if we
could maintain the two vmspaces synchronized (one for EPT and one for
normal use). This way I shouldn't rewrite the copyin/copyout procedures.
But I'm still thinking which would be the best approach.

Another things I should approach are:
- invalidating instruction [invept] (I should invalidate the EPT with a
special call when it's appropiate)
- emulating the A/D bits. Unfortunately my testing hardware does not
support the A/D for EPT. I've looked over FreeBSD implementation and I will
get the mechanisms from there I guess.

Nothing tested yet with EPT. Hope tomorrow to do some tests to see what
errors EPT would throw me :).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dragonflybsd.org/pipermail/kernel/attachments/20130825/223c9e6b/attachment-0003.htm>

More information about the Kernel mailing list