HAMMER2 design in progress - 1-2 year time frame

Matthew Dillon dillon at apollo.backplane.com
Fri May 13 08:13:55 PDT 2011


:Hi,
:
:I'm a DFBSD lurker and I certainly wouldn't profess to understand
:everything in that document, but a comp sci degree means I just about
:followed :)
:
:I noticed you had decided not to store the versioned inodes in the
:B-Tree, which I believe is what HAMMER 1 did. I was curious why that
:was - I saw mention of not being able to have parent pointers in on
:media structures, but want sure if it was related and what the impact
:was.
:
:Otherwise, sounds pretty incredible; that any sub trees can be PFS,
:writable snapshots and multiple roots are all ingenious.
:
:Cheers, Alex J Burke.

    The B-Tree is a great indexing mechanism but it suffers from
    two big problems in HAMMER1.

    First it is very difficult to keep things organized for locality of
    reference.  For example when you split a B-Tree node, the split node
    can wind up seriously far away from the original with regards to
    disk seek times.  The seek inefficiencies are largely removed if one
    uses swapcache.

    Second and more importantly, manipulation of the B-Tree requires
    an UNDO FIFO (which HAMMER1 implements).  The UNDO FIFO is one of
    HAMMER1's most complex bits of code and how other implications as
    well, such as having to hold dirty buffers locked to prevent the
    kernel from writing them out too early (the UNDO data has to be
    flushed first).

    HAMMER2 primarily addresses the second point.  By chasing blocks
    all the way to the root HAMMER2 doesn't need an UNDO FIFO at all,
    and (theoretically) doesn't need special handling of the buffer
    cache by the OS because write ordering no longer matters.  The
    disk flush sequence prior to updating the volume root is still
    necessary but it's still far less complex than HAMMER1's UNDO
    sequencing.

    The trade-off is that HAMMER2 currently does not index its inode space
    by inode number (there being no B-Tree or other data structure to
    maintain such an index in), which is its primary reason for not being
    able to support hardlinks.  On the plus side the B-Tree in HAMMER1
    could support only simple snapshots while the block referential model
    HAMMER2 will use can also support writable snapshots (i.e. snapshot
    forks).

						-Matt





More information about the Kernel mailing list