Hammer Benchmark Fun

Sat Jan 8 11:52:48 PST 2011

    Well, the numbers are all over the place and some of them don't make
    any sense, which kinda implies cockpit trouble somewhere.

    The blogbench tests make some sense, but people will get a false sense
    of performance (or lack of) because the authors don't quite understand
    how blogbench works.  Blogbench uses an ever-growing data set size so
    if write performance is horrible the concurrent reading will all fit
    in the buffer cache because the data set size simply does not grow
    very much during the test.  The read performance will of ocurse be very
    high in that case because it won't be going to disk at all.  If
    write performance is good then read performance is going to suffer
    greatly simply due to the fact that the data set size is much larger
    (due to the improved write performance) and thus typically does not
    fit in the buffer cache.

    Another major problem with this test suite depends on whether e.g.
    ZFS is doing compression or data de-dup or not (and the other tests
    too).  Most filesystem benchmark programs do NOT write random data 
    into files.  They typically write all zero's or the same pattern.
    Needless to say when you write files full of zeros or use the same
    pattern on a filesystem which does compression and/or de-dup you are
    going to have a very, VERY high apparent performance.  But it isn't
    real performance, because in real life that level of activity is not
    writing all zeros to files.

    In particular, I just have to question the BTRFS and the ZFS numbers
    for most of these tests.  From the looks of BTRFS isn't going to disk
    virtually at all.  That kinda implies its built-in compression is
    trivializing the data set being written by the benchmark programs,
    skewing the results badly.

    The gzip tests make no sense.  gzip is cpu bound.  It looks to me
    that the linux tests are running an optimized version of gzip.  This
    isn't testing the filesystem at all.

    The postmark tests... you have to look at whether postmark is running
    fsync() and then you to look at the fsync handling mode for each
    filesystem.  It's sad to say but most filesystems do NOT handle fsync()
    calls properly.  So if you are going to run those sorts of tests you have
    to be cognizant of the issue and at least have the filesystems set up
    to run fsync the same way so the tests are more realistic.  The content
    of the files must also be checked.  And, again, the ext4 and btrfs
    numbers look just plain wrong.  There's something going on in there
    that is shortcutting the test.

    Compression in general is a very interesting dynamic that will probably
    become more and more important as cpu power continues to increase,
    particularly in MP environments.  But if you are going to run filesystem
    tests you need to be sure you are testing the same thing and not
    hitting degenerate conditions due to e.g. the test data.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>