HAMMER2 design in progress - 1-2 year time frame
Matthew Dillon
dillon at apollo.backplane.com
Fri May 13 08:13:55 PDT 2011
:Hi,
:
:I'm a DFBSD lurker and I certainly wouldn't profess to understand
:everything in that document, but a comp sci degree means I just about
:followed :)
:
:I noticed you had decided not to store the versioned inodes in the
:B-Tree, which I believe is what HAMMER 1 did. I was curious why that
:was - I saw mention of not being able to have parent pointers in on
:media structures, but want sure if it was related and what the impact
:was.
:
:Otherwise, sounds pretty incredible; that any sub trees can be PFS,
:writable snapshots and multiple roots are all ingenious.
:
:Cheers, Alex J Burke.
The B-Tree is a great indexing mechanism but it suffers from
two big problems in HAMMER1.
First it is very difficult to keep things organized for locality of
reference. For example when you split a B-Tree node, the split node
can wind up seriously far away from the original with regards to
disk seek times. The seek inefficiencies are largely removed if one
uses swapcache.
Second and more importantly, manipulation of the B-Tree requires
an UNDO FIFO (which HAMMER1 implements). The UNDO FIFO is one of
HAMMER1's most complex bits of code and how other implications as
well, such as having to hold dirty buffers locked to prevent the
kernel from writing them out too early (the UNDO data has to be
flushed first).
HAMMER2 primarily addresses the second point. By chasing blocks
all the way to the root HAMMER2 doesn't need an UNDO FIFO at all,
and (theoretically) doesn't need special handling of the buffer
cache by the OS because write ordering no longer matters. The
disk flush sequence prior to updating the volume root is still
necessary but it's still far less complex than HAMMER1's UNDO
sequencing.
The trade-off is that HAMMER2 currently does not index its inode space
by inode number (there being no B-Tree or other data structure to
maintain such an index in), which is its primary reason for not being
able to support hardlinks. On the plus side the B-Tree in HAMMER1
could support only simple snapshots while the block referential model
HAMMER2 will use can also support writable snapshots (i.e. snapshot
forks).
-Matt
More information about the Kernel
mailing list