git: kernel - Improve exec performance
dillon at crater.dragonflybsd.org
Wed Feb 22 10:50:26 PST 2017
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date: Wed Feb 22 10:43:00 2017 -0800
kernel - Improve exec performance
* Improves non-shared 32-way-concurrent exec performance for a small
static binary on the xeon from 92KE/s (92000 execs/sec) to 136KE/s.
* Improves single-threaded test performance from ~4.5KE/s to ~6.5KE/s.
And for reasons I don't entirely understand, sometimes up to ~8KE/s.
* Several changes here, but the only one that matters for the test is
that the pv_placemarker_wakeup() code removes a spin_lock/spin_unlock
pair on the pmap. I adjusted the code so the pmap spinlock is not
required for placemarker wakeup operations.
What I think might have happened here is that this removal also got
rid of a spin-lock shared/exclusive ping-pong. Still, the huge
improvement in performance was not expected. Even with the removal
there is still an atomic_swap_long() in the code path.
My guess is that multiple atomic ops degrade the instruction pipeline
more than one would otherwise expect due to the multiple memory
Summary of changes:
sys/platform/pc64/x86_64/pmap.c | 54 +++++++++++++++++++++++++++++------------
1 file changed, 39 insertions(+), 15 deletions(-)
DragonFly BSD source repository
More information about the Commits