git: kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes

Matthew Dillon dillon at crater.dragonflybsd.org
Tue Oct 18 11:20:15 PDT 2011


commit b12defdc619df06fafb50cc7535a919224daa63c
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date:   Tue Oct 18 10:36:11 2011 -0700

    kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes
    
    This is a very large patch which reworks locking in the entire VM subsystem,
    concentrated on VM objects and the x86-64 pmap code.  These fixes remove
    nearly all the spin lock contention for non-threaded VM faults and narrows
    contention for threaded VM faults to just the threads sharing the pmap.
    
    Multi-socket many-core machines will see a 30-50% improvement in parallel
    build performance (tested on a 48-core opteron), depending on how well
    the build parallelizes.
    
    As part of this work a long-standing problem on 64-bit systems where programs
    would occasionally seg-fault or bus-fault for no reason has been fixed.  The
    problem was related to races between vm_fault, the vm_object collapse code,
    and the vm_map splitting code.
    
    * Most uses of vm_token have been removed.  All uses of vm_spin have been
      removed.  These have been replaced with per-object tokens and per-queue
      (vm_page_queues[]) spin locks.
    
      Note in particular that since we still have the page coloring code the
      PQ_FREE and PQ_CACHE queues are actually many queues, individually
      spin-locked, resulting in very excellent MP page allocation and freeing
      performance.
    
    * Reworked vm_page_lookup() and vm_object->rb_memq.  All (object,pindex)
      lookup operations are now covered by the vm_object hold/drop system,
      which utilize pool tokens on vm_objects.  Calls now require that the
      VM object be held in order to ensure a stable outcome.
    
      Also added vm_page_lookup_busy_wait(), vm_page_lookup_busy_try(),
      vm_page_busy_wait(), vm_page_busy_try(), and other API functions
      which integrate the PG_BUSY handling.
    
    * Added OBJ_CHAINLOCK.  Most vm_object operations are protected by
      the vm_object_hold/drop() facility which is token-based.  Certain
      critical functions which must traverse backing_object chains use
      a hard-locking flag and lock almost the entire chain as it is traversed
      to prevent races against object deallocation, collapses, and splits.
    
      The last object in the chain (typically a vnode) is NOT locked in
      this manner, so concurrent faults which terminate at the same vnode will
      still have good performance.  This is important e.g. for parallel compiles
      which might be running dozens of the same compiler binary concurrently.
    
    * Created a per vm_map token and removed most uses of vmspace_token.
    
    * Removed the mp_lock in sys_execve().  It has not been needed in a while.
    
    * Add kmem_lim_size() which returns approximate available memory (reduced
      by available KVM), in megabytes.  This is now used to scale up the
      slab allocator cache and the pipe buffer caches to reduce unnecessary
      global kmem operations.
    
    * Rewrote vm_page_alloc(), various bits in vm/vm_contig.c, the swapcache
      scan code, and the pageout scan code.  These routines were rewritten
      to use the per-queue spin locks.
    
    * Replaced the exponential backoff in the spinlock code with something
      a bit less complex and cleaned it up.
    
    * Restructured the IPIQ func/arg1/arg2 array for better cache locality.
      Removed the per-queue ip_npoll and replaced it with a per-cpu gd_npoll,
      which is used by other cores to determine if they need to issue an
      actual hardware IPI or not.  This reduces hardware IPI issuance
      considerably (and the removal of the decontention code reduced it even
      more).
    
    * Temporarily removed the lwkt thread fairq code and disabled a number of
      features.  These will be worked back in once we track down some of the
      remaining performance issues.
    
      Temproarily removed the lwkt thread resequencer for tokens for the same
      reason.  This might wind up being permanent.
    
      Added splz_check()s in a few critical places.
    
    * Increased the number of pool tokens from 1024 to 4001 and went to a
      prime-number mod algorithm to reduce overlaps.
    
    * Removed the token decontention code.  This was a bit of an eyesore and
      while it did its job when we had global locks it just gets in the way now
      that most of the global locks are gone.
    
      Replaced the decontention code with a fall back which acquires the
      tokens in sorted order, to guarantee that deadlocks will always be
      resolved eventually in the scheduler.
    
    * Introduced a simplified spin-for-a-little-while function
      _lwkt_trytoken_spin() that the token code now uses rather than giving
      up immediately.
    
    * The vfs_bio subsystem no longer uses vm_token and now uses the
      vm_object_hold/drop API for buffer cache operations, resulting
      in very good concurrency.
    
    * Gave the vnode its own spinlock instead of sharing vp->v_lock.lk_spinlock,
      which fixes a deadlock.
    
    * Adjusted all platform pamp.c's to handle the new main kernel APIs.  The
      i386 pmap.c is still a bit out of date but should be compatible.
    
    * Completely rewrote very large chunks of the x86-64 pmap.c code.  The
      critical path no longer needs pmap_spin but pmap_spin itself is still
      used heavily, particularin the pv_entry handling code.
    
      A per-pmap token and per-pmap object are now used to serialize pmamp
      access and vm_page lookup operations when needed.
    
      The x86-64 pmap.c code now uses only vm_page->crit_count instead of
      both crit_count and hold_count, which fixes races against other parts of
      the kernel uses vm_page_hold().
    
      _pmap_allocpte() mechanics have been completely rewritten to remove
      potential races.  Much of pmap_enter() and pmap_enter_quick() has also
      been rewritten.
    
      Many other changes.
    
    * The following subsystems (and probably more) no longer use the vm_token
      or vmobj_token in critical paths:
    
      x The swap_pager now uses the vm_object_hold/drop API instead of vm_token.
    
      x mmap() and vm_map/vm_mmap in general now use the vm_object_hold/drop API
        instead of vm_token.
    
      x vnode_pager
    
      x zalloc
    
      x vm_page handling
    
      x vfs_bio
    
      x umtx system calls
    
      x vm_fault and friends
    
    * Minor fixes to fill_kinfo_proc() to deal with process scan panics (ps)
      revealed by recent global lock removals.
    
    * lockmgr() locks no longer support LK_NOSPINWAIT.  Spin locks are
      unconditionally acquired.
    
    * Replaced netif/e1000's spinlocks with lockmgr locks.  The spinlocks
      were not appropriate owing to the large context they were covering.
    
    * Misc atomic ops added

Summary of changes:
 sys/cpu/i386/include/atomic.h                      |   11 +
 sys/cpu/i386/include/cpu.h                         |    5 +-
 sys/cpu/x86_64/include/atomic.h                    |   11 +
 sys/cpu/x86_64/include/cpu.h                       |    5 +-
 sys/dev/agp/agp.c                                  |   18 +-
 sys/dev/agp/agp_i810.c                             |    9 +-
 sys/dev/netif/e1000/e1000_osdep.h                  |   15 +-
 sys/dev/netif/e1000/if_em.h                        |   32 +-
 sys/emulation/43bsd/43bsd_vm.c                     |    2 -
 .../linux/i386/linprocfs/linprocfs_misc.c          |   29 +-
 sys/emulation/linux/i386/linux_machdep.c           |    6 +-
 sys/kern/imgact_aout.c                             |    9 +-
 sys/kern/imgact_elf.c                              |   59 +-
 sys/kern/init_main.c                               |    2 +-
 sys/kern/kern_clock.c                              |   25 +-
 sys/kern/kern_exec.c                               |   12 +-
 sys/kern/kern_kinfo.c                              |   27 +-
 sys/kern/kern_lock.c                               |    9 +-
 sys/kern/kern_slaballoc.c                          |  182 ++--
 sys/kern/kern_spinlock.c                           |  200 ++--
 sys/kern/kern_synch.c                              |   10 +-
 sys/kern/kern_umtx.c                               |    6 -
 sys/kern/kern_xio.c                                |   12 -
 sys/kern/link_elf.c                                |    4 +-
 sys/kern/link_elf_obj.c                            |    4 +-
 sys/kern/lwkt_ipiq.c                               |  119 ++-
 sys/kern/lwkt_thread.c                             |  434 ++++----
 sys/kern/lwkt_token.c                              |  478 +++++---
 sys/kern/sys_pipe.c                                |   24 +
 sys/kern/sys_process.c                             |    9 +-
 sys/kern/sysv_shm.c                                |    4 +-
 sys/kern/tty.c                                     |    7 +-
 sys/kern/uipc_syscalls.c                           |   32 +-
 sys/kern/vfs_bio.c                                 |  132 ++-
 sys/kern/vfs_cache.c                               |   50 +-
 sys/kern/vfs_cluster.c                             |    5 +
 sys/kern/vfs_journal.c                             |   12 +-
 sys/kern/vfs_lock.c                                |   31 +-
 sys/kern/vfs_mount.c                               |    6 +-
 sys/kern/vfs_subr.c                                |   42 +-
 sys/kern/vfs_vm.c                                  |   17 +-
 sys/platform/pc32/i386/machdep.c                   |    4 -
 sys/platform/pc32/i386/pmap.c                      |  179 ++--
 sys/platform/pc32/include/pmap.h                   |   10 +
 sys/platform/pc64/include/pmap.h                   |   14 +-
 sys/platform/pc64/x86_64/pmap.c                    |  908 +++++++++------
 sys/platform/vkernel/conf/files                    |    1 +
 sys/platform/vkernel/i386/cpu_regs.c               |    4 +-
 sys/platform/vkernel/i386/mp.c                     |    2 +
 sys/platform/vkernel/include/pmap.h                |   10 +
 sys/platform/vkernel/platform/pmap.c               |  159 ++-
 sys/platform/vkernel64/conf/files                  |    1 +
 sys/platform/vkernel64/include/pmap.h              |   10 +
 sys/platform/vkernel64/platform/pmap.c             |  183 ++--
 sys/platform/vkernel64/x86_64/cpu_regs.c           |    9 +-
 sys/platform/vkernel64/x86_64/mp.c                 |    4 +-
 sys/sys/globaldata.h                               |   11 +-
 sys/sys/lock.h                                     |    2 +-
 sys/sys/malloc.h                                   |    1 +
 sys/sys/param.h                                    |    1 +
 sys/sys/spinlock.h                                 |   14 +-
 sys/sys/spinlock2.h                                |   96 ++-
 sys/sys/thread.h                                   |   16 +-
 sys/sys/time.h                                     |    1 +
 sys/sys/vnode.h                                    |    6 +-
 sys/vfs/devfs/devfs_vnops.c                        |    2 +-
 sys/vfs/nwfs/nwfs_io.c                             |    2 +-
 sys/vfs/procfs/procfs_map.c                        |   31 +-
 sys/vfs/smbfs/smbfs_io.c                           |    2 +-
 sys/vm/device_pager.c                              |    6 +-
 sys/vm/phys_pager.c                                |    2 -
 sys/vm/pmap.h                                      |    5 +-
 sys/vm/swap_pager.c                                |  216 +++--
 sys/vm/vm.h                                        |    1 +
 sys/vm/vm_contig.c                                 |  171 ++--
 sys/vm/vm_fault.c                                  |  533 ++++++----
 sys/vm/vm_glue.c                                   |   10 +-
 sys/vm/vm_kern.c                                   |    9 +-
 sys/vm/vm_map.c                                    |  508 ++++++---
 sys/vm/vm_map.h                                    |   39 +-
 sys/vm/vm_meter.c                                  |    4 +-
 sys/vm/vm_mmap.c                                   |   58 +-
 sys/vm/vm_object.c                                 | 1160 ++++++++++++--------
 sys/vm/vm_object.h                                 |   34 +-
 sys/vm/vm_page.c                                   | 1180 ++++++++++++++------
 sys/vm/vm_page.h                                   |  169 +--
 sys/vm/vm_page2.h                                  |   55 +
 sys/vm/vm_pageout.c                                |  619 +++++++----
 sys/vm/vm_swap.c                                   |   59 +-
 sys/vm/vm_swapcache.c                              |  152 ++-
 sys/vm/vm_unix.c                                   |    8 +-
 sys/vm/vm_vmspace.c                                |    5 +-
 sys/vm/vm_zone.c                                   |    2 -
 sys/vm/vnode_pager.c                               |  148 ++-
 94 files changed, 5533 insertions(+), 3409 deletions(-)

http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b12defdc619df06fafb50cc7535a919224daa63c


-- 
DragonFly BSD source repository





More information about the Commits mailing list