Matthew Dillon dillon at apollo.backplane.com
Mon Nov 5 12:36:29 PST 2007

:Speaking of on-disk B-Trees, ReiserFS' biggest problems are all based
:on its use of flexible B-Trees. For instance, the difficulty of
:correctly rebuilding the tree if a node is damaged, and correctly
:detecting if a ReiserFS system is hosted on a file in another
:(supposedly damaged) ReiserFS system, are noted as big problems that
:are supposedly solved for Reiser4. Are there likely to be similar
:issues in Hammer for the time being, or have you already planned far
:ahead for these cases with extra information in the nodes?
:Dmitri Nikulin

     HAMMER has an independant B-Tree for each cluster.  The clusters are
     then glued together (leaf -> new cluster root) to form a unified B-Tree.
     Except for lookups and no-free-space cases, B-Tree operations are
     local to a cluster.

     A cluster in HAMMER is around 64MB and can have up to 4096 records.

     If a cluster needs to be recovered, HAMMER will simply throw away
     the B-Tree and regenerate it from scratch using the cluster's
     record list.  This way all B-Tree I/O operations can be asynchronous
     and do not have to be flushed on fsync.

     At the same time HAMMER will remove any records whos creation
     transaction id's are too large (i.e. not synchronized with the cluster
     header), and will zero out the delete transaction id for any records
     whos deletion transaction id's are too large.

     Essentially recovery is handled by performing a rollback on a
     cluster-by-cluster basis.

     HAMMER doesn't bother to scan clusters at mount time (that would be
     expensive!).  Instead it will recover a cluster on the fly when the
     cluster is first accessed and HAMMER notices that it is marked 'open'
     when it shouldn't be.


     The real performance issue for HAMMER is going to be B-Tree insertions
     and rebalancing across clusters.  I think most of the issues can be
     resolved with appropriate heuristics and by a backup ground process to
     slowly rebalance clusters.  This will require a lot of work, though,
     and only minimal rebalancing will be in for the release.


More information about the Kernel mailing list