git: kernel - VM rework part 12 - Core pmap work, stabilize & optimize

Matthew Dillon dillon at
Tue May 21 10:26:27 PDT 2019

commit e3c330f022862fd9619ec21263385481ca58da4f
Author: Matthew Dillon <dillon at>
Date:   Sun May 19 09:53:12 2019 -0700

    kernel - VM rework part 12 - Core pmap work, stabilize & optimize
    * Add tracking for the number of PTEs mapped writeable in md_page.
      Change how PG_WRITEABLE and PG_MAPPED is cleared in the vm_page
      to avoid clear/set races.  This problem occurs because we would
      have otherwise tried to clear the bits without hard-busying the
      page. This allows the bits to be set with only an atomic op.
      Procedures which test these bits universally do so while holding
      the page hard-busied, and now call pmap_mapped_sfync() prior to
      properly synchronize the bits.
    * Fix bugs related to various counterse.  pm_stats.resident_count,
      wiring counts, vm_page->md.writeable_count, and
    * Fix bugs related to synchronizing removed pte's with the vm_page.
      Fix one case where we were improperly updating (m)'s state based
      on a lost race against a pte swap-to-0 (pulling the pte).
    * Fix a bug related to the page soft-busying code when the
      m->object/m->pindex race is lost.
    * Implement a heuristical version of vm_page_active() which just
      updates act_count unlocked if the page is already in the
      PQ_ACTIVE queue, or if it is fictitious.
    * Allow races against the backing scan for pmap_remove_all() and
      pmap_page_protect(VM_PROT_READ).  Callers of these routines for
      these cases expect full synchronization of the page dirty state.
      We can identify when a page has not been fully cleaned out by
      checking vm_page->md.pmap_count and vm_page->md.writeable_count.
      In the rare situation where this happens, simply retry.
    * Assert that the PTE pindex is properly interlocked in pmap_enter().
      We still allows PTEs to be pulled by other routines without the
      interlock, but multiple pmap_enter()s of the same page will be
    * Assert additional wiring count failure cases.
    * (UNTESTED) Flag DEVICE pages (dev_pager_getfake()) as being
      PG_UNMANAGED.  This essentially prevents all the various
      reference counters (e.g. vm_page->md.pmap_count and
      vm_page->md.writeable_count), PG_M, PG_A, etc from being
      The vm_page's aren't tracked in the pmap at all because there
      is no way to find them.. they are 'fake', so without a pv_entry,
      we can't track them.  Instead we simply rely on the vm_map_backing
      scan to manipulate the PTEs.
    * Optimize the new vm_map_entry_shadow() to use a shared object
      token instead of an exclusive one.  OBJ_ONEMAPPING will be cleared
      with the shared token.
    * Optimize single-threaded access to pmaps to avoid pmap_inval_*()
    * Optimize __read_mostly for more globals.
    * Optimize pmap_testbit(), pmap_clearbit(), pmap_page_protect().
      Pre-check vm_page->md.writeable_count and vm_page->md.pmap_count
      for an easy degenerate return; before real work.
    * Optimize pmap_inval_smp() and pmap_inval_smp_cmpset() for the
      single-threaded pmap case, when called on the same CPU the pmap
      is associated with.  This allows us to use simple atomics and
      cpu_*() instructions and avoid the complexities of the
      pmap_inval_*() infrastructure.
    * Randomize the page queue used in bio_page_alloc().  This does not
      appear to hurt performance (e.g. heavy tmpfs use) on large many-core
      NUMA machines and it makes the vm_page_alloc()'s job easier.
      This change might have a downside for temporary files, but for more
      long-lasting files there's no point allocating pages localized to a
      particular cpu.
    * Optimize vm_page_alloc().
      (1) Refactor the _vm_page_list_find*() routines to avoid re-scanning
          the same array indices over and over again when trying to find
          a page.
      (2) Add a heuristic, vpq.lastq, for each queue, which we set if a
          _vm_page_list_find*() operation had to go far-afield to find its
          page.  Subsequent finds will skip to the far-afield position until
          the current CPUs queues have pages again.
      (3) Reduce PQ_L2_SIZE From an extravagant 2048 entries per queue down
          to 1024.  The original 2048 was meant to provide 8-way
          set-associativity for 256 cores but wound up reducing performance
          due to longer index iterations.
    * Refactor the vm_page_hash[] array.  This array is used to shortcut
      vm_object locks and locate VM pages more quickly, without locks.
      The new code limits the size of the array to something more reasonable,
      implements a 4-way set-associative replacement policy using 'ticks',
      and rewrites the hashing math.
    * Effectively remove pmap_object_init_pt() for now.  In current tests
      it does not actually improve performance, probably because it may
      map pages that are not actually used by the program.
    * Remove vm_map_backing->refs.  This field is no longer used.
    * Remove more of the old now-stale code related to use of pv_entry's
      for terminal PTEs.
    * Remove more of the old shared page-table-page code.  This worked but
      could never be fully validated and was prone to bugs.  So remove it.
      In the future we will likely use larger 2MB and 1GB pages anyway.
    * Remove pmap_softwait()/pmap_softhold()/pmap_softdone().
    * Remove more #if 0'd code.

Summary of changes:
 sys/kern/kern_fork.c                   |   3 +-
 sys/kern/kern_synch.c                  |   8 +-
 sys/kern/vfs_bio.c                     |   5 +
 sys/platform/pc64/include/pmap.h       |   2 +
 sys/platform/pc64/x86_64/pmap.c        | 360 +++++++++++++++++++++++----------
 sys/platform/pc64/x86_64/pmap_inval.c  |  17 +-
 sys/platform/vkernel64/platform/pmap.c |  19 ++
 sys/vm/device_pager.c                  |   2 +-
 sys/vm/pmap.h                          |   2 +
 sys/vm/swap_pager.c                    |   4 +-
 sys/vm/vm_fault.c                      |   5 +-
 sys/vm/vm_map.c                        |  22 +-
 sys/vm/vm_map.h                        |   1 -
 sys/vm/vm_object.c                     |  10 +-
 sys/vm/vm_page.c                       | 209 ++++++++++++-------
 sys/vm/vm_page.h                       |  23 ++-
 sys/vm/vm_page2.h                      |   6 +-
 sys/vm/vm_pageout.c                    |   4 +-
 18 files changed, 487 insertions(+), 215 deletions(-)

DragonFly BSD source repository

More information about the Commits mailing list