git: kernel - VM rework part 12 - Core pmap work, stabilize & optimize
dillon at crater.dragonflybsd.org
Tue May 21 10:26:27 PDT 2019
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date: Sun May 19 09:53:12 2019 -0700
kernel - VM rework part 12 - Core pmap work, stabilize & optimize
* Add tracking for the number of PTEs mapped writeable in md_page.
Change how PG_WRITEABLE and PG_MAPPED is cleared in the vm_page
to avoid clear/set races. This problem occurs because we would
have otherwise tried to clear the bits without hard-busying the
page. This allows the bits to be set with only an atomic op.
Procedures which test these bits universally do so while holding
the page hard-busied, and now call pmap_mapped_sfync() prior to
properly synchronize the bits.
* Fix bugs related to various counterse. pm_stats.resident_count,
wiring counts, vm_page->md.writeable_count, and
* Fix bugs related to synchronizing removed pte's with the vm_page.
Fix one case where we were improperly updating (m)'s state based
on a lost race against a pte swap-to-0 (pulling the pte).
* Fix a bug related to the page soft-busying code when the
m->object/m->pindex race is lost.
* Implement a heuristical version of vm_page_active() which just
updates act_count unlocked if the page is already in the
PQ_ACTIVE queue, or if it is fictitious.
* Allow races against the backing scan for pmap_remove_all() and
pmap_page_protect(VM_PROT_READ). Callers of these routines for
these cases expect full synchronization of the page dirty state.
We can identify when a page has not been fully cleaned out by
checking vm_page->md.pmap_count and vm_page->md.writeable_count.
In the rare situation where this happens, simply retry.
* Assert that the PTE pindex is properly interlocked in pmap_enter().
We still allows PTEs to be pulled by other routines without the
interlock, but multiple pmap_enter()s of the same page will be
* Assert additional wiring count failure cases.
* (UNTESTED) Flag DEVICE pages (dev_pager_getfake()) as being
PG_UNMANAGED. This essentially prevents all the various
reference counters (e.g. vm_page->md.pmap_count and
vm_page->md.writeable_count), PG_M, PG_A, etc from being
The vm_page's aren't tracked in the pmap at all because there
is no way to find them.. they are 'fake', so without a pv_entry,
we can't track them. Instead we simply rely on the vm_map_backing
scan to manipulate the PTEs.
* Optimize the new vm_map_entry_shadow() to use a shared object
token instead of an exclusive one. OBJ_ONEMAPPING will be cleared
with the shared token.
* Optimize single-threaded access to pmaps to avoid pmap_inval_*()
* Optimize __read_mostly for more globals.
* Optimize pmap_testbit(), pmap_clearbit(), pmap_page_protect().
Pre-check vm_page->md.writeable_count and vm_page->md.pmap_count
for an easy degenerate return; before real work.
* Optimize pmap_inval_smp() and pmap_inval_smp_cmpset() for the
single-threaded pmap case, when called on the same CPU the pmap
is associated with. This allows us to use simple atomics and
cpu_*() instructions and avoid the complexities of the
* Randomize the page queue used in bio_page_alloc(). This does not
appear to hurt performance (e.g. heavy tmpfs use) on large many-core
NUMA machines and it makes the vm_page_alloc()'s job easier.
This change might have a downside for temporary files, but for more
long-lasting files there's no point allocating pages localized to a
* Optimize vm_page_alloc().
(1) Refactor the _vm_page_list_find*() routines to avoid re-scanning
the same array indices over and over again when trying to find
(2) Add a heuristic, vpq.lastq, for each queue, which we set if a
_vm_page_list_find*() operation had to go far-afield to find its
page. Subsequent finds will skip to the far-afield position until
the current CPUs queues have pages again.
(3) Reduce PQ_L2_SIZE From an extravagant 2048 entries per queue down
to 1024. The original 2048 was meant to provide 8-way
set-associativity for 256 cores but wound up reducing performance
due to longer index iterations.
* Refactor the vm_page_hash array. This array is used to shortcut
vm_object locks and locate VM pages more quickly, without locks.
The new code limits the size of the array to something more reasonable,
implements a 4-way set-associative replacement policy using 'ticks',
and rewrites the hashing math.
* Effectively remove pmap_object_init_pt() for now. In current tests
it does not actually improve performance, probably because it may
map pages that are not actually used by the program.
* Remove vm_map_backing->refs. This field is no longer used.
* Remove more of the old now-stale code related to use of pv_entry's
for terminal PTEs.
* Remove more of the old shared page-table-page code. This worked but
could never be fully validated and was prone to bugs. So remove it.
In the future we will likely use larger 2MB and 1GB pages anyway.
* Remove pmap_softwait()/pmap_softhold()/pmap_softdone().
* Remove more #if 0'd code.
Summary of changes:
sys/kern/kern_fork.c | 3 +-
sys/kern/kern_synch.c | 8 +-
sys/kern/vfs_bio.c | 5 +
sys/platform/pc64/include/pmap.h | 2 +
sys/platform/pc64/x86_64/pmap.c | 360 +++++++++++++++++++++++----------
sys/platform/pc64/x86_64/pmap_inval.c | 17 +-
sys/platform/vkernel64/platform/pmap.c | 19 ++
sys/vm/device_pager.c | 2 +-
sys/vm/pmap.h | 2 +
sys/vm/swap_pager.c | 4 +-
sys/vm/vm_fault.c | 5 +-
sys/vm/vm_map.c | 22 +-
sys/vm/vm_map.h | 1 -
sys/vm/vm_object.c | 10 +-
sys/vm/vm_page.c | 209 ++++++++++++-------
sys/vm/vm_page.h | 23 ++-
sys/vm/vm_page2.h | 6 +-
sys/vm/vm_pageout.c | 4 +-
18 files changed, 487 insertions(+), 215 deletions(-)
DragonFly BSD source repository
More information about the Commits