git: kernel - Many fixes for vkernel support, plus a few main kernel fixes
dillon at crater.dragonflybsd.org
Thu Feb 2 18:29:06 PST 2017
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date: Tue Jan 31 20:14:05 2017 -0800
kernel - Many fixes for vkernel support, plus a few main kernel fixes
* The big enchillada is that the main kernel's thread switch code has
a small timing window where it clears the PM_ACTIVE bit for the cpu
while switching between two threads. However, it *ALSO* checks and
avoids loading the %cr3 if the two threads have the same pmap.
This results in a situation where an invalidation on the pmap in another
cpuc may not have visibility to the cpu doing the switch, and yet the
cpu doing the switch also decides not to reload %cr3 and so does not
invalidate the TLB either. The result is a stale TLB and bad things
For now just unconditionally load %cr3 until I can come up with code
to handle the case.
This bug is very difficult to reproduce on a normal system, it requires
a multi-threaded program doing nasty things (munmap, etc) on one cpu
while another thread is switching to a third thread on some other cpu.
* KNOTE after handling the vkernel trap in postsig() instead of before.
* Change the kernel's pmap_inval_smp() code to take a 64-bit npgs
argument instead of a 32-bit npgs argument. This fixes situations
that crop up when a process uses more than 16TB of address space.
* Add an lfence to the pmap invalidation code that I think might be
* Handle some wrap/overflow cases in pmap_scan() related to the use of
large address spaces.
* Fix an unnecessary invltlb in pmap_clearbit() for unmanaged PTEs.
* Test PG_RW after locking the pv_entry to handle potential races.
* Add bio_crc to struct bio. This field is only used for debugging for
now but may come in useful later.
* Add some global debug variables in the pmap_inval_smp() and related
paths. Refactor the npgs handling.
* Load the tsc_target field after waiting for completion of the previous
invalidation op instead of before. Also add a conservative mfence()
in the invalidation path before loading the info fields.
* Remove the global pmap_inval_bulk_count counter.
* Adjust swtch.s to always reload the user process %cr3, with an
explanation. FIXME LATER!
* Add some test code to vm/swap_pager.c which double-checks that the page
being paged out does not get corrupted during the operation. This code
is #if 0'd.
* We must hold an object lock around the swp_pager_meta_ctl() call in
swp_pager_async_iodone(). I think.
* Reorder when PG_SWAPINPROG is cleared. Finish the I/O before clearing
* Change the vm_map_growstack() API to pass a vm_map in instead of
* Use atomic ops for vm_object->generation counts, since objects can be
* Unconditionally save the FP state after returning from VMSPACE_CTL_RUN.
This solves a severe FP corruption bug in the vkernel due to calls it
makes into libc (which uses %xmm registers all over the place).
This is not a complete fix. We need a formal userspace/kernelspace FP
abstraction. Right now the vkernel doesn't have a kernelspace FP
abstraction so if a kernel thread switches preemptively bad things
* The kernel tracks and locks pv_entry structures to interlock pte's.
The vkernel never caught up, and does not really have a pv_entry or
placemark mechanism. The vkernel's pmap really needs a complete
re-port from the real-kernel pmap code. Until then, we use poor hacks.
* Use the vm_page's spinlock to interlock pte changes.
* Make sure that PG_WRITEABLE is set or cleared with the vm_page
* Have pmap_clearbit() acquire the pmobj token for the pmap in the
iteration. This appears to be necessary, currently, as most of the
rest of the vkernel pmap code also uses the pmobj token.
* Fix bugs in the vkernel's swapu32() and swapu64().
* Change pmap_page_lookup() and pmap_unwire_pgtable() to fully busy
the page. Note however that a page table page is currently never
soft-busied. Also other vkernel code that busies a page table page.
* Fix some sillycode in a pmap->pm_ptphint test.
* Don't inherit e.g. PG_M from the previous pte when overwriting it
with a pte of a different physical address.
* Change the vkernel's pmap_clear_modify() function to clear VTPE_RW
(which also clears VPTE_M), and not just VPTE_M. Formally we want
the vkernel to be notified when a page becomes modified and it won't
be unless we also clear VPTE_RW and force a fault. <--- I may change
this back after testing.
* Wrap pmap_replacevm() with a critical section.
* Scrap the old grow_stack() code. vm_fault() and vm_fault_page() handle
it (vm_fault_page() just now got the ability).
* Properly flag VM_FAULT_USERMODE.
Summary of changes:
sys/kern/kern_sig.c | 17 ++--
sys/platform/pc64/include/pmap_inval.h | 2 +-
sys/platform/pc64/x86_64/mp_machdep.c | 1 +
sys/platform/pc64/x86_64/pmap.c | 67 +++++++++----
sys/platform/pc64/x86_64/pmap_inval.c | 96 +++++++++++--------
sys/platform/pc64/x86_64/swtch.s | 21 +++-
sys/platform/pc64/x86_64/trap.c | 17 ++--
sys/platform/vkernel64/include/pmap.h | 1 +
sys/platform/vkernel64/include/pmap_inval.h | 4 +-
sys/platform/vkernel64/include/proc.h | 3 +-
sys/platform/vkernel64/platform/copyio.c | 9 +-
sys/platform/vkernel64/platform/pmap.c | 110 ++++++++++-----------
sys/platform/vkernel64/platform/pmap_inval.c | 137 ++++++++++++++-------------
sys/platform/vkernel64/x86_64/trap.c | 17 +++-
sys/platform/vkernel64/x86_64/vm_machdep.c | 12 ---
sys/sys/bio.h | 1 +
sys/vm/swap_pager.c | 111 ++++++++++++++++++----
sys/vm/vm_fault.c | 10 +-
sys/vm/vm_map.c | 19 +++-
sys/vm/vm_map.h | 2 +-
sys/vm/vm_object.c | 16 ++--
sys/vm/vm_page.c | 9 +-
sys/vm/vm_pageout.c | 26 ++---
23 files changed, 427 insertions(+), 281 deletions(-)
DragonFly BSD source repository
More information about the Commits