Update on recent SMP contention work
dillon at apollo.backplane.com
Wed Oct 16 23:34:44 PDT 2013
A whole lot more work on reducing SMP contention has gone into master
recently and will be in the upcoming release:
Name cache shared lock fix. Most concurrent path lookups are now
non-contending through the entire code stack.
More use of shared spinlocks in the pmap code (+ fixes). Most
concurrent VM faults are now non-contending through the entire
Filesystem syncer improvements. Syncer now tracks dirty vnodes with
dirty inodes and with possibly dirty VM pages (via mmap), in
addition to vnodes with dirty buffer cache buffers. nfs, tmpfs,
and hammer now support a mechanism to scan the tracked vnodes instead
of scanning all vnodes. This makes 'sync' and the automatic
filesystem syncer much more efficient.
Fork and Fork/Exec code paths are now vastly more efficient due to
greatly reduced lock contention. Primarily driven by avoiding
unnecessary tracking of VM shadow chains on terminal vnodes (which
inevitably is the executable binary), allowing shared locks to be
used for terminal vnodes during a fork or exec.
The per-cpu process reaper (handles exit/wait) now uses a per-cpu
token rather than a global token.
Various pid-related improvements, such as removing the totally
unnecessary acquisition of a global token when looking up your
own process pid.
The jist of this work is that there is no longer virtually any
contention for most process-related activities, including heavy use
of fork and fork/exec in 'make', '/bin/sh', and other utilities.
Anything which forks and/or execs a lot (scripts, bulk builds, service
daemons, etc) will now run as close to optimally as it is possible to
run on a multi-core box.
In particular with the last change to the namecache code, our bulk
ports builds look pretty insane on monster (our 48-core opteron box).
Now during a bulk dports build, the load can pop up to 300 with concurrent
compiles and of that 300 there will be 295 non-contending "R"un state
processes and only 5 contending "D" state processes. And it all happens
with virtually *NO* IPI traffic between cpus.
I consider this a fairly major milestone for the project. We aren't
finished, but this is a major leap in our ability to fully utilize the
resources on larger multi-core systems.
<dillon at backplane.com>
More information about the Users