Page fault handling in vpagetable area clarification
dillon at apollo.backplane.com
Sat Feb 10 12:16:41 PST 2007
:I'll keep the technical questions public so that web searches can find them.
:In vm_fault_object(), fs.prot gets downgraded unconditionally if this is a
:VM_MAPTYPE_VPAGETABLE entry. But if this was a write fault,
:vm_fault_vpagetable() has already set VPTE_M (and if the vkernel clears it,
:its pmap_clearbit() will invalidate the real kernel's pagetables). Why can't
:the protection stay RW in this case?
fs.prot is only downgraded for read faults on writable pages. Write
faults on writable pages will be mapped RW.
Normally when a read fault occurs on a writable page the kernel will
map the page read-write, but still mark the page as being clean in its
vm_page structure. Any future write to that page will cause the
hardware page table's modified bit to be set. The real kernel lazily
checks the modified bit in the hardware page table entry at some future
time to determine if the page is actually still clean or not.
In order to properly simulate the setting of the modified bit in the
virtual page table, read faults to writable pages within the VM space
governed by the virtual page table must be mapped read-only instead
of read-write in order to force an actual write fault to occur if the
page is written. Otherwise the real kernel has no way of knowing
when to set the modified bit in the virtual page table entry.
That is what is supposed to happen. If it doesn't happen that way
please tell me :-). There are some optimizations that can be done
to reduce the number of 'double page' faults that occur, such as
proactively setting the modified bit in the virtual page table entry
on a read fault based on some heuristic or cached historical data
(where the expectation is that the page will be modified soon even
though the immediate fault was a read fault), in order to be able
to map the page read-write immediately.
<dillon at backplane.com>
More information about the Kernel