git: kernel - Greatly improve shared memory fault rate concurrency / shared tokens
Matthew Dillon
dillon at crater.dragonflybsd.org
Tue Nov 15 01:34:13 PST 2011
commit 54341a3b445fade1bbc473141893a7e06c06ccb5
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date: Tue Nov 15 01:02:24 2011 -0800
kernel - Greatly improve shared memory fault rate concurrency / shared tokens
This commit rolls up a lot of work to improve postgres database operations
and the system in general. With this changes we can pgbench -j 8 -c 40 on
our 48-core opteron monster at 140000+ tps, and the shm vm_fault rate
hits 3.1M pps.
* Implement shared tokens. They work as advertised, with some cavets.
It is acceptable to acquire a shared token while you already hold the same
token exclusively, but you will deadlock if you acquire an exclusive token
while you hold the same token shared.
Currently exclusive tokens are not given priority over shared tokens so
starvation is possible under certain circumstances.
* Create a critical code path in vm_fault() using the new shared token
feature to quickly fault-in pages which already exist in the VM cache.
pmap_object_init_pt() also uses the new feature.
This increases fault-in concurrency by a ridiculously huge amount,
particularly on SHM segments (say when you have a large number of postgres
clients). Scaling for large numbers of clients on large numbers of
cores is significantly improved.
This also increases fault-in concurrency for MAP_SHARED file maps.
* Expand the breadn() and cluster_read() APIs. Implement breadnx() and
cluster_readx() which allows a getblk()'d bp to be passed. If *bpp is not
NULL a bp is being passed in, otherwise the routines call getblk().
* Modify the HAMMER read path to use the new API. Instead of calling
getcacheblk() HAMMER now calls getblk() and checks the B_CACHE flag.
This gives getblk() a chance to regenerate a fully cached buffer from
VM backing store without having to acquire any hammer-related locks,
resulting in even faster operation.
* If kern.ipc.shm_use_phys is set to 2 the VM pages will be pre-allocated.
This can take quite a while for a large map and also lock the machine
up for a few seconds. Defaults to off.
* Reorder the smp_invltlb()/cpu_invltlb() combos in a few places, running
cpu_invltlb() last.
* An invalidation interlock might be needed in pmap_enter() under certain
circumstances, enable the code for now.
* vm_object_backing_scan_callback() was failing to properly check the
validity of a vm_object after acquiring its token. Add the required
check + some debugging.
* Make vm_object_set_writeable_dirty() a bit more cache friendly.
* The vmstats sysctl was scanning every process's vm_map (requiring a
vm_map read lock to do so), which can stall for long periods of time
when the system is paging heavily. Change the mechanic to a LWP flag
which can be tested with minimal locking.
* Have the phys_pager mark the page as dirty too, to make sure nothing
tries to free it.
* Remove the spinlock in pmap_prefault_ok(), since we do not delete page
table pages it shouldn't be needed.
* Add a required cpu_ccfence() in pmap_inval.c. The code generated prior
to this fix was still correct, and this makes sure it stays that way.
* Replace several manual wiring cases with calls to vm_page_wire().
Summary of changes:
sys/gnu/vfs/ext2fs/ext2_alloc.c | 2 +
sys/gnu/vfs/ext2fs/ext2_balloc.c | 2 +
sys/gnu/vfs/ext2fs/ext2_inode.c | 2 +
sys/gnu/vfs/ext2fs/ext2_linux_balloc.c | 3 +-
sys/gnu/vfs/ext2fs/ext2_linux_ialloc.c | 1 +
sys/gnu/vfs/ext2fs/ext2_subr.c | 2 +
sys/kern/lwkt_thread.c | 65 +---
sys/kern/lwkt_token.c | 726 +++++++++++++-------------------
sys/kern/sysv_shm.c | 36 ++-
sys/kern/usched_bsd4.c | 2 -
sys/kern/vfs_bio.c | 34 +-
sys/kern/vfs_cluster.c | 8 +-
sys/platform/pc32/i386/pmap.c | 6 +-
sys/platform/pc64/x86_64/pmap.c | 18 +-
sys/platform/pc64/x86_64/pmap_inval.c | 7 +-
sys/platform/vkernel/platform/pmap.c | 9 +-
sys/platform/vkernel64/platform/pmap.c | 9 +-
sys/sys/buf.h | 6 +-
sys/sys/buf2.h | 26 ++
sys/sys/globaldata.h | 5 +-
sys/sys/proc.h | 1 +
sys/sys/thread.h | 29 +-
sys/sys/thread2.h | 5 +-
sys/vfs/hammer/hammer_io.c | 1 +
sys/vfs/hammer/hammer_ondisk.c | 1 +
sys/vfs/hammer/hammer_vnops.c | 21 +-
sys/vfs/hammer/hammer_volume.c | 2 +
sys/vfs/hpfs/hpfs_alsubr.c | 2 +
sys/vfs/hpfs/hpfs_subr.c | 2 +
sys/vfs/hpfs/hpfs_vfsops.c | 1 +
sys/vfs/hpfs/hpfs_vnops.c | 1 +
sys/vfs/isofs/cd9660/cd9660_lookup.c | 2 +
sys/vfs/isofs/cd9660/cd9660_rrip.c | 2 +
sys/vfs/isofs/cd9660/cd9660_vfsops.c | 2 +
sys/vfs/isofs/cd9660/cd9660_vnops.c | 2 +
sys/vfs/msdosfs/msdosfs_denode.c | 2 +
sys/vfs/msdosfs/msdosfs_fat.c | 2 +
sys/vfs/msdosfs/msdosfs_lookup.c | 2 +
sys/vfs/msdosfs/msdosfs_vfsops.c | 2 +
sys/vfs/ntfs/ntfs_subr.c | 4 +-
sys/vfs/ntfs/ntfs_vfsops.c | 2 +
sys/vfs/ntfs/ntfs_vnops.c | 2 +
sys/vfs/tmpfs/tmpfs_vnops.c | 2 +
sys/vfs/udf/udf_vfsops.c | 2 +
sys/vfs/ufs/ffs_alloc.c | 3 +-
sys/vfs/ufs/ffs_balloc.c | 2 +
sys/vfs/ufs/ffs_inode.c | 1 +
sys/vfs/ufs/ffs_subr.c | 2 +
sys/vfs/ufs/ffs_vfsops.c | 2 +
sys/vfs/userfs/userfs_vnops.c | 3 +
sys/vm/phys_pager.c | 2 +-
sys/vm/vm_fault.c | 344 +++++++++++++---
sys/vm/vm_kern.c | 1 +
sys/vm/vm_map.h | 3 +-
sys/vm/vm_meter.c | 32 +--
sys/vm/vm_object.c | 73 +++-
sys/vm/vm_object.h | 7 +-
sys/vm/vm_page.c | 70 +++-
sys/vm/vm_page.h | 18 +-
59 files changed, 961 insertions(+), 665 deletions(-)
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/54341a3b445fade1bbc473141893a7e06c06ccb5
--
DragonFly BSD source repository
More information about the Commits
mailing list