git: kernel - Optimize spinlocks for 48-core contention

Wed Oct 26 11:39:12 PDT 2011

commit 43e72e79e549059473a43a7f99e1b469564c28d0
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date:   Wed Oct 26 11:18:54 2011 -0700

    kernel - Optimize spinlocks for 48-core contention
    
    * Change the spinlock algorithm to do a read-test before atomic_swap_int().
      This has no effect on single-chip cpus (tested on phenom II quad-core),
      but has a HUGE HUGE HUGE effect on multi-chip/many-core systems.  On
      monster (48-core opteron / 4 x 12-core chips) concurrent kernel compile
      time is reduced from 170 seconds to 75 seconds with this one change.
      That's well over 100%.
    
      The reason the change is important is because it unloads the hardware
      cache coherency bus and communication by creating a closed-loop with
      the pre-read, which essentially passively waits for the cache update
      instead of actively issuing a locked bus cycle memory op.  This prevents
      total armagheddon on the memory busses when a substantial number of
      cores are doing real work.
    
    * Increase the number of pool spinlocks from 1024 to 8192.  We need them
      now that vm_page's use pool spinlocks.

Summary of changes:
 sys/kern/kern_spinlock.c |   97 ++++++++++++++++++++++++++++++++--------------
 1 files changed, 68 insertions(+), 29 deletions(-)

http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/43e72e79e549059473a43a7f99e1b469564c28d0


-- 
DragonFly BSD source repository