git: kernel - Many fixes for vkernel support, plus a few main kernel fixes

Matthew Dillon dillon at
Thu Feb 2 18:29:06 PST 2017

commit 95270b7ecb6e66e42892546590e8bbf44d405ea3
Author: Matthew Dillon <dillon at>
Date:   Tue Jan 31 20:14:05 2017 -0800

    kernel - Many fixes for vkernel support, plus a few main kernel fixes
    * The big enchillada is that the main kernel's thread switch code has
      a small timing window where it clears the PM_ACTIVE bit for the cpu
      while switching between two threads.  However, it *ALSO* checks and
      avoids loading the %cr3 if the two threads have the same pmap.
      This results in a situation where an invalidation on the pmap in another
      cpuc may not have visibility to the cpu doing the switch, and yet the
      cpu doing the switch also decides not to reload %cr3 and so does not
      invalidate the TLB either.  The result is a stale TLB and bad things
      For now just unconditionally load %cr3 until I can come up with code
      to handle the case.
      This bug is very difficult to reproduce on a normal system, it requires
      a multi-threaded program doing nasty things (munmap, etc) on one cpu
      while another thread is switching to a third thread on some other cpu.
    * KNOTE after handling the vkernel trap in postsig() instead of before.
    * Change the kernel's pmap_inval_smp() code to take a 64-bit npgs
      argument instead of a 32-bit npgs argument.  This fixes situations
      that crop up when a process uses more than 16TB of address space.
    * Add an lfence to the pmap invalidation code that I think might be
    * Handle some wrap/overflow cases in pmap_scan() related to the use of
      large address spaces.
    * Fix an unnecessary invltlb in pmap_clearbit() for unmanaged PTEs.
    * Test PG_RW after locking the pv_entry to handle potential races.
    * Add bio_crc to struct bio.  This field is only used for debugging for
      now but may come in useful later.
    * Add some global debug variables in the pmap_inval_smp() and related
      paths.  Refactor the npgs handling.
    * Load the tsc_target field after waiting for completion of the previous
      invalidation op instead of before.  Also add a conservative mfence()
      in the invalidation path before loading the info fields.
    * Remove the global pmap_inval_bulk_count counter.
    * Adjust swtch.s to always reload the user process %cr3, with an
      explanation.  FIXME LATER!
    * Add some test code to vm/swap_pager.c which double-checks that the page
      being paged out does not get corrupted during the operation.  This code
      is #if 0'd.
    * We must hold an object lock around the swp_pager_meta_ctl() call in
      swp_pager_async_iodone().  I think.
    * Reorder when PG_SWAPINPROG is cleared.  Finish the I/O before clearing
      the bit.
    * Change the vm_map_growstack() API to pass a vm_map in instead of
    * Use atomic ops for vm_object->generation counts, since objects can be
      locked shared.
    * Unconditionally save the FP state after returning from VMSPACE_CTL_RUN.
      This solves a severe FP corruption bug in the vkernel due to calls it
      makes into libc (which uses %xmm registers all over the place).
      This is not a complete fix.  We need a formal userspace/kernelspace FP
      abstraction.  Right now the vkernel doesn't have a kernelspace FP
      abstraction so if a kernel thread switches preemptively bad things
    * The kernel tracks and locks pv_entry structures to interlock pte's.
      The vkernel never caught up, and does not really have a pv_entry or
      placemark mechanism.  The vkernel's pmap really needs a complete
      re-port from the real-kernel pmap code.  Until then, we use poor hacks.
    * Use the vm_page's spinlock to interlock pte changes.
    * Make sure that PG_WRITEABLE is set or cleared with the vm_page
      spinlock held.
    * Have pmap_clearbit() acquire the pmobj token for the pmap in the
      iteration.  This appears to be necessary, currently, as most of the
      rest of the vkernel pmap code also uses the pmobj token.
    * Fix bugs in the vkernel's swapu32() and swapu64().
    * Change pmap_page_lookup() and pmap_unwire_pgtable() to fully busy
      the page.  Note however that a page table page is currently never
      soft-busied.  Also other vkernel code that busies a page table page.
    * Fix some sillycode in a pmap->pm_ptphint test.
    * Don't inherit e.g. PG_M from the previous pte when overwriting it
      with a pte of a different physical address.
    * Change the vkernel's pmap_clear_modify() function to clear VTPE_RW
      (which also clears VPTE_M), and not just VPTE_M.  Formally we want
      the vkernel to be notified when a page becomes modified and it won't
      be unless we also clear VPTE_RW and force a fault.  <--- I may change
      this back after testing.
    * Wrap pmap_replacevm() with a critical section.
    * Scrap the old grow_stack() code.  vm_fault() and vm_fault_page() handle
      it (vm_fault_page() just now got the ability).
    * Properly flag VM_FAULT_USERMODE.

Summary of changes:
 sys/kern/kern_sig.c                          |  17 ++--
 sys/platform/pc64/include/pmap_inval.h       |   2 +-
 sys/platform/pc64/x86_64/mp_machdep.c        |   1 +
 sys/platform/pc64/x86_64/pmap.c              |  67 +++++++++----
 sys/platform/pc64/x86_64/pmap_inval.c        |  96 +++++++++++--------
 sys/platform/pc64/x86_64/swtch.s             |  21 +++-
 sys/platform/pc64/x86_64/trap.c              |  17 ++--
 sys/platform/vkernel64/include/pmap.h        |   1 +
 sys/platform/vkernel64/include/pmap_inval.h  |   4 +-
 sys/platform/vkernel64/include/proc.h        |   3 +-
 sys/platform/vkernel64/platform/copyio.c     |   9 +-
 sys/platform/vkernel64/platform/pmap.c       | 110 ++++++++++-----------
 sys/platform/vkernel64/platform/pmap_inval.c | 137 ++++++++++++++-------------
 sys/platform/vkernel64/x86_64/trap.c         |  17 +++-
 sys/platform/vkernel64/x86_64/vm_machdep.c   |  12 ---
 sys/sys/bio.h                                |   1 +
 sys/vm/swap_pager.c                          | 111 ++++++++++++++++++----
 sys/vm/vm_fault.c                            |  10 +-
 sys/vm/vm_map.c                              |  19 +++-
 sys/vm/vm_map.h                              |   2 +-
 sys/vm/vm_object.c                           |  16 ++--
 sys/vm/vm_page.c                             |   9 +-
 sys/vm/vm_pageout.c                          |  26 ++---
 23 files changed, 427 insertions(+), 281 deletions(-)

DragonFly BSD source repository

More information about the Commits mailing list