Benchmarking

Sun Mar 28 10:09:57 PST 2004

:I finally got some time to do DragonFly; the (somewhat suprising) 
:results can be seen here: http://geri.cc.fer.hr/~ivoras/osbench/osbench.sxc
:
:It seems that DF could do a lot better but there's something holding it 
:back (especially in the web CMS test). I got a lot of 'too many database 
:connections' errors, like the db server doesn't get the chance to free 
:the connection when it's closed so that the next client can use it. 
:Also, under DF a bug in a program was reveiled: an object was not 
:protected by a mutex/lock like it should be. It is interesting that the 
:bug had not manifested on all the other tested systems, but on DF it was 
:consistent and frequent (maybe the process-switching rate in DF is higher?).
:
:(This is work in progress, and I will create a world-readable report in 
:PDF or HTML when it's finished. Until then, the above document will be 
:the only published data).

    Yes, the recent scheduler changes were actually a little too sensitive
    when it came to preemptive switching of user processes when returning
    from a system call.  I actually softened it up a bit last night.  Forked
    children are also given a lower initial priority, so the initial spread
    of cpu over the process set is very different.  This combination seems
    to do a good job bringing out MP bugs though that wasn't the intention!
    When I first made the scheduler changes all my -j N buildworlds started
    failing (due to missing dependancies in the Makefile's)!

    'Too many database connections' errors sound like something that could
    be tuned, but without more information it's hard to guess at what the
    problem is.  Maybe a soft file descriptor limit is being reached or
    something.  The CMS transaction rate issue is probably either related
    to the reported error creating issues, or it is related to the over
    active scheduler (which hopefully was fixed last night).  The other
    CMS numbers look right.

    The ByteBenches are generally a function of the compiler.  FreeBSD-5 is
    using GCC3 by default and on modern hardware it tends to produce better
    code, hence the drystone results.  The execl throughput is a little
    low, I was hoping it would be at least as good as 4.9, but at least it's
    better then 5.x :-).  The scheduler fixes *might* improve those numbers.
    The pipe-based context switching looks reasonable.  I actually would have
    expected FreeBSD-5 to win here because their PIPE code is totally
    giant-free.  The shell script performance is rather odd, I'm not sure
    I believe the FreeBSD-5.2-CUSTOM number there.

    Bonnie++ looks inline with expectations.  DragonFly should generally have
    similar performance to 4.9 (i.e. better then 5.x).  I'm not sure what
    is going on with the Per-Char numbers but it isn't something we would
    normally care about.  NetBSD is obviously faking something there (probably
    doing some caching even when told not to).  Uncachable VFS operations
    (Sequential Craete, Random Create) are going to be a bit slower on
    DFly verses FreeBSD-4.9 due to the serializing token overhead.  I'm
    actually a bit surprised that DragonFly is beating out FreeBSD-5.x
    there, perhaps FreeBSD-5.x is not being compiled with the same filesystem
    optimizations (like UFS_DIRHASH).  I may have bumped up the cache limits
    for some of them in DragonFly and it just happens to fit the dataset.
    Someone in FreeBSD-5 land should probably investigate the low numbers.

    Note, however, that DragonFly's %CPU numbers scale very well to the
    bench results.  This seems to indicate that the issues are
    concurrency/blocking related rather then hoggy code.

    I don't know what ubench is doing, I would expect that since it is
    a userland cpu-bound program that the numbers would be tied to the
    compiler and thus similar to 4.9.  The numbers aren't bad, just not
    expected given the uniformity of the results from the other OSs. 
    If it is taking a lot VM faults then this could actually be related
    to a recent pmap bug fix that is in DFly but I think was put into
    FreeBSD-5 after the 5.2 release.

    The PG TPS numbers look about right.  What you are seeing is SMP
    overhead.  FreeBSD-4.9 is probably serializing/batching the operations
    more (which always makes raw TPS numbers look better), but if so this
    is normally not observable unless you also measure the standard deviation
    of the transaction latency.  That's why raw TPS numbers make for bad
    benchmarks.  It's just too easy for a broken scheduler to revert to
    batching and make them look better then they really are.

    CMS: already covered.  I'd be interested in knowing whether rerunning
    that test with the latest kernel (and after figuring out what is causing
    the error messages) improves the numbers any.

    Buildworld tests?  Building the same world or each project's own worlds?
    You can't really compare buildworld times because the projects have vastly
    different data set sizes.  For example, DragonFly rebuilds a lot more
    when you run 'buildworld' then FreeBSD... it's rebuilding the entire tool
    set rather then just some of the tools, and it's building two different
    compilers instead of one.  It's going to take longer, generally.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>