Blogbench RAID benchmarks

Mon Jul 18 19:08:53 PDT 2011

    Ok, well this is interesting.  Basically it comes down to whether we
    want to starve read operations or whether we want to starve write 
    operations.

    The FreeBSD results starve read operations, while the DragonFly results
    starve write operations.  That's the entirety of the difference between
    the two tests.

    The final numbers don't do justice to this... if you look at the
    raw numbers though it is apparent.  When the blogbench test blows out
    system caches the read activity on FreeBSD drops into the ~600 range
    while on DragonFly the read activity drops to the ~25000 range.  At
    the same time FreeBSD's write activity stays in the ~4000 range while
    DragonFly's write activity drops into the ~50's.

    I tracked the reason for the DragonFly write activity dropping.  It
    basically comes down to the backlog of inodes in HAMMER needing
    reclamation.  Due to the heavy concurrent read load the HAMMER flusher
    is constantly stuck in B-Tree locks and cannot flush inode meta-data
    out quickly enough to keep up with blogbench.  Once it hits the inode
    backlog limit (25000) write throughput goes down drastically.

    While one can increase the limit (vfs.hammer.limit_reclaim), all that
    happens is that HAMMER takes a little longer before it hits it, at
    least in the blogbench test.  For more bursty bulk write operations
    increasing the limit would be a good tuning parameter.

    Frankly both FreeBSDs and DragonFlys results are incorrect.  FreeBSD is
    killing read performance way way way too much while DragonFly is killing
    write performance way way way too much.

    I'm not sure how it could be fixed, though.  I can definitely reduce
    B-Tree deadlocks in HAMMER by unlocking b-tree nodes during synchronous
    read I/O (for meta-data), but the result that we really want is more
    balanced read vs write performance, not these extreme tilts that we see.

    Also note that blogbench's 'final' results are worthless.  The read
    performance is mostly counting the pre-cache-blowout numbers.  DragonFly's
    read performance is 41x FreeBSD's once the caches are blown out,
    while FreeBSD's write performance is 80x DragonFly's write performance
    once the caches are blown out.  Reads tend to be less localized than
    writes so, generally speaking, the disk bandwidth *IS* being used fairly
    efficiently in both cases.  But neither result is really acceptable
    IMHO.

    This is all with swapcache turned off.  The only way to test in a
    fair manner with swapcache turned on (with a SSD) is if the FreeBSD
    test used a similar setup w/ZFS.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>