HAMMER update 19-June-2008 (56C) (HEADS UP - MEDIA CHANGED)

Thu Jun 19 23:32:12 PDT 2008

    56C represents an additional significant improvement in performance,
    plus bug fixes and most of the final media changes.

    As with all the commits this week, a kernel and utilities rebuild plus
    a newfs_hammer is needed to continue testing.

    The filesystem block size now increases from 16K to 64K once a file
    has grown past 1MB.  This improves write performance to the point
    where I don't really need to implement cluster_write(), so I've decided
    to forego doing that for the release.

    I will be making one final media change on Friday and then HAMMER 
    development will go into testing & bug fixing mode until the release.
    This last media change will fix mtime and atime storage.  At the moment
    mtime/atime updates require generating UNDO records and, needless to say,
    they're expensive.  I will consider my options tomorrow but I think I
    am going to just not include those fields in the CRC so they can be
    updated asynchronously, without any UNDO's.

				    --
				 Stability
				    --

    I have really begun pounding the filesystem by running blogbench,
    buildworld -j 8, and fsx simultaniously on two test boxes.  I expect
    that any remaining bugs will be worked out over the next week or two.

				     --
				 Performance
				     --

    All performance work except for the atime/mtime issue is now complete.
    WYSIWYG.  HAMMER's performance is extremely good now, and its system
    cpu overhead has dropped to roughly the same that we get from UFS
    (buildworlds run 610-620 seconds of system time for HAMMER, and
    610-620 seconds of system time for UFS).

    HAMMER is now able to sustain full disk bandwidth for bulk reads and
    writes.  HAMMER continues to have far superior random-write performance,
    whether the system caches are blown out or not.  Not only that but
    the performance can potentially improve even more if I redo the 
    deadlock avoidance algorithms.  HAMMER is within 10% of UFS's read
    performance under light and medium loads.

    HAMMER has a somewhat larger system cache footprint then UFS.  After
    extensive testing with blogbench I've determined that HAMMER's
    read performance figures past blog 250 (where the system caches get
    blown out on my 1G test box) are actually almost as good as UFSes *IF*
    HAMMER's write performance were to drop to the same levels as UFS's
    (poor) write performance past that point. 

    But because HAMMER's write performance doesn't drop, the system cache
    is never able to settle down into a 95-percentile cached data set.
    Basically the only reason UFS has good read performance numbers for
    blogbench once the system caches are blown out is because UFS's
    write performance is so poor the data set is no longer growing
    significantly and no longer eating away at the cache.

    HAMMER's random re-writing performance does drop a bit relative to
    UFS, primarily due to HAMMER's history retention mechanic.  It isn't
    too bad and pruning/reblocking cleans it up so we're gonna have to
    run with it for the release.

    I will be working on the footprint size a bit, but I am very happy with
    the current state of affairs.

				     --
				 Release TODO
				     --

    There are many auxillary items I want to get fully working for the
    release.  There are some minor issues with the reblocker and pruner,
    some issues with how to recover space after the filesystem has filled
    up, plus I want to write a recovery program for catastrophic failures.
    (not a fsck, but a way to extract whatever good information can be
    found from a corrupted HAMMER filesystem).  I will also probably be
    making other adjustments to the filesystem.... nothing I expect to
    mess up media compatibility past tomorrow, but to help support future
    features such as mirroring, better low level storage allocation, and
    so forth.

				     --
				 Mirroring 
				     --

    I am not going to promise it, but there is a slight chance I will be
    able to get mirroring working by the release.  I figured out how to
    do it, finally.  Basically the solution is to add another field to
    the B-Tree's internal elements... the 'most recent' transaction id,
    and to propogate it up all the way to the root of the tree.  The
    mirroring code can then optimally scan the B-Tree and pick out all
    records that have changed relative to some transaction id, allowing
    it to quickly 'pick up' where it left off and construct a record-level
    mirror over a fully asynchronous link, without any queueing.  You can't
    get much better then that, frankly.

    I could go on and on, there's so much that can be done with this
    filesytem :-)

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>