segment pmap optimizations implemented for x86-64
Matthew Dillon
dillon at apollo.backplane.com
Wed Sep 12 18:44:26 PDT 2012
Experimental pmap optimizations are now in master and can be enabled
with a sysctl.
These optimizations effect ANY shared RW or RO mmap() or sysv shared
memory attachment which is a multiple of the segment size and which
is segment aligned, regardless of whether a process is threaded or
forked or separately exec'd.
mmap and sysv_shm now also segment-align conforming mappings
automatically. The segment size on x86-64 is 2MB.
Essentially what this does is cause the page table pages across all
the mappings to be shared. The page table pages, NOT the terminal pages.
The actual page tables themselves will be selectively shared.
This is NOT using 2MB physical pages, at least not yet. This solves
the problem particularly with postgres databases in another fashion
that happens to be generally useful throughout the system. We might
implement 2MB physical pages later on, leveraging the new infrastructure,
but it isn't on my personal list for now.
This is currently considered VERY experimental. The feature is disabled
by default but can be turned on at any time with sysctl
machdep.pmap_mmu_optimize=1.
-Matt
Commit message below:
commit 921c891ecf560602acfc7540df7a760f171e389e
kernel - Implement segment pmap optimizations for x86-64
* Implement 2MB segment optimizations for x86-64. Any shared read-only
or read-write VM object mapped into memory, including physical objects
(so both sysv_shm and mmap), which is a multiple of the segment size
and segment-aligned can be optimized.
* Enable with sysctl machdep.pmap_mmu_optimize=1
Default is off for now. This is an experimental feature.
* It works as follows: A VM object which is large enough will, when VM
faults are generated, store a truncated pmap (PD, PT, and PTEs) in the
VM object itself.
VM faults whos vm_map_entry's can be optimized will cause the PTE, PT,
and also the PD (for now) to be stored in a pmap embedded in the VM_OBJECT
instead of in the process pmap.
The process pmap then creates PT entry in the PD page table that points
to the PT page table page stored in the VM_OBJECT's pmap.
* This removes nearly all page table overhead from fork()'d processes or
even unrelated process which massively share data via mmap() or sysv_shm.
We still recommend using sysctl kern.ipc.shm_use_phys=1 (which is now
the default), which also removes the PV entries associated with the
shared pmap. However, with this optimization PV entries are no longer
a big issue since they will not be replicated in each process, only in
the common pmap stored in the VM_OBJECT.
* Features of this optimization:
* Number of PV entries is reduced to approximately the number of live
pages and no longer multiplied by the number of processes separately
mapping the shared memory.
* One process faulting in a page naturally makes the PTE available to
all other processes mapping the same shared memory. The other processes
do not have to fault that same page in.
* Page tables survive process exit and restart.
* Once page tables are populated and cached, any new process that maps
the shared memory will take far fewer faults because each fault will
bring in an ENTIRE page table. Postgres w/ 64-clients, VM fault rate
was observed to drop from 1M faults/sec to less than 500 at startup,
and during the run the fault rates dropped from a steady decline into
the hundreds of thousands into an instant decline to virtually zero
VM faults.
* We no longer have to depend on sysv_shm to optimize the MMU.
* CPU caches will do a better job caching page tables since most of
them are now themselves shared. Even when we invltlb, more of the
page tables will be in the L1, L2, and L3 caches.
* EXPERIMENTAL!!!!!
Matthew Dillon
<dillon at backplane.com>
More information about the Kernel
mailing list