MPSAFE progress and testing in master

Mon Dec 28 00:34:21 PST 2009

2009/12/28 Matthew Dillon <dillon at apollo.backplane.com>:
>    As I said earlier, the development tree is going to be in a bit of flux
>    this week as I do another push on the MPSAFE work.  I expect some
>    instability.
>
>    Currently there is a known bug with programs getting stuck in poll
>    (but from earlier commits before christmas, not recent ones).
>
>    The latest push adds fine-grained locking to the namecache and path
>    lookups and adds a sysctl vfs.cache_mpsafe to turn off the MP lock
>    for [f,l]stat() and open().  This push also removes a good chunk of
>    the MP lock contention from standard vnode referencing and dereferencing
>    operations all the way through to and including the HAMMER filesystem
>    code.
>
>    Earlier mpsafe sysctls for read(), write(), and getattr (stat)
>    operations are now the default and the sysctls have been removed.
>    This applies to HAMMER filesystems only.
>
>    I was able to push over 1M uncontended namei calls/sec on
>    my Phenom II test box and over 612K partially contended namei
>    calls/sec, timing 100,000 stat() calls in a loop on four different
>    filenames (one for each cpu), the per-cpu run times are as follows
>    with different combinations of path names (with differing levels of
>    contention based on common path elements):
>
>        on /usr/obj/test{1,2,3,4}
>
>        0.590 seconds with 1 instance running,   512K namei/sec
>        0.865 seconds with 2 instances running,  700K namei/sec
>        1.315 seconds with 3 instances running,  700K namei/sec
>        2.122 seconds with 4 instances running,  612K namei/sec
>
>        on /usr/obj/dir{1,2,3,4}/test                   (less contention)
>
>        0.740 seconds with 1 instance running,   544K namei/sec
>        1.013 seconds with 2 instances running,  793K namei/sec
>        1.260 seconds with 3 instances running,  967K namei/sec
>        1.695 seconds with 4 instances running,  955K namei/sec
>
>        cd /usr/obj/dir{1,2,3,4} and run on 'test'      (no contention)
>                                                        (short path)
>
>        0.281 seconds with 1 instance running,   358K namei/sec
>        0.331 seconds with 2 instances running,  608K namei/sec
>        0.340 seconds with 3 instances running,  885K namei/sec
>        0.351 seconds with 4 instances running, 1145K namei/sec <--- 1M+
>
>
>    Parallel file reads also work quite well verses previously.  Reading
>    a 512KB file 10,000 times in a loop and timing it yields only a
>    minor loss of performance per-cpu, meaning total parallel performance
>    has a good multiplier and is very good:
>
>        0.707 seconds with 1 instance running
>        0.849 seconds with 2 instances running
>        0.949 seconds with 3 instances running
>        1.005 seconds with 4 instances running
>
>    buildworld times have improved a tiny bit, from around 1230 to
>    around 1195, but they will oscillate around as I continue the work
>    because there are numerous duplicate locks in the syscall paths at
>    the moment.  Buildworld has other bottlenecks, of course.  Mainly
>    I'm just using buildworld times to make sure I haven't created a
>    performance regression.  I did notice the -j 8 buildworld hit over
>    450K namei/sec in systat.
>
>    I am going to spend this week stabilizing the new code before
>    moving onto the next low-hanging fruit (probably vm_fault).
>
>                                        -Matt
>                                        Matthew Dillon
>                                        <dillon at backplane.com>
>
Affirming magnificent news running buildkernel now reasoning the real
question is no realtime and more ordo time while hardware can impress
the buggiest codebase to fastest. One ordo notation unlisted is O(1/n)
to 0 which should go for distributed computing. Meaning more load
enables more capacity distributed while other way round mechanically.
Realizing it's difficult to explain storing 20 different bit in 14
applied combinatorically.
greets happy HammerFS tester/user Niklas