git: kernel - Optimize spinlocks for 48-core contention
dillon at crater.dragonflybsd.org
Wed Oct 26 11:39:12 PDT 2011
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date: Wed Oct 26 11:18:54 2011 -0700
kernel - Optimize spinlocks for 48-core contention
* Change the spinlock algorithm to do a read-test before atomic_swap_int().
This has no effect on single-chip cpus (tested on phenom II quad-core),
but has a HUGE HUGE HUGE effect on multi-chip/many-core systems. On
monster (48-core opteron / 4 x 12-core chips) concurrent kernel compile
time is reduced from 170 seconds to 75 seconds with this one change.
That's well over 100%.
The reason the change is important is because it unloads the hardware
cache coherency bus and communication by creating a closed-loop with
the pre-read, which essentially passively waits for the cache update
instead of actively issuing a locked bus cycle memory op. This prevents
total armagheddon on the memory busses when a substantial number of
cores are doing real work.
* Increase the number of pool spinlocks from 1024 to 8192. We need them
now that vm_page's use pool spinlocks.
Summary of changes:
sys/kern/kern_spinlock.c | 97 ++++++++++++++++++++++++++++++++--------------
1 files changed, 68 insertions(+), 29 deletions(-)
DragonFly BSD source repository
More information about the Commits