HAMMER update 10-Feb-2008

Matthew Dillon dillon at apollo.backplane.com
Sun Feb 10 13:21:39 PST 2008


    HAMMER is really shaping up now. Here's what works now:

     * All filesystem operations
     * All historical operations
     * All Pruning features

    Here's what is left:

     * freemap code (allocate and free big-blocks, which are 8MB blocks).
       Currently a hack so everything else can be tested, nothing is
       actually freed.

     * undo fifo and related recovery code.  Most of the API calls are
       in place, the back-end buffer reservation, flushes, and recovery
       need to be implemented.

     * big-block cleaning code (this is different from the pruning code).

     * Structural locking.  The B-Tree is fine-grained locked but the
       locks for the blockmap are just a hack (one big lock).

    These are all fairly low difficulty items, most of the infrastructure
    needed to support their function is already in place and the FIFO
    infrastructure has already been tested (just not mapped onto a blockmap
    yet).

    I have already run some tests with regards to the blockmap allocation
    model and it looks very good.   What I did was implement an array of
    blockmap entry structures rather then just an array of pointers to the
    actual physical big-blocks.  The blockmap entry structure not only has
    a pointer to the underlying physical big-block, it also has a
    bytes_free field which specifies how many bytes in the underlying
    big-block are free.

    This is the only tracking done by the blockmap.  It does not actually
    try to track WHERE in the big-block the free areas are... figuring
    that out will be up to the cleaning code.  What this gives us is the
    following:

    * Extremely fast freeing of on-disk storage elements.  The target
      physical block doesn't have to be read or written, only the governing
      blockmap entry.  With 8MB big-blocks and 32-byte blockmap entries one
      16K buffer can track 4GB worth of underlying storage, which means
      that freeing large amounts of sparse information does not cause the
      disk to seek all over the place.

      This is far, FAR better then the cluster model I was using last week
      and had to throw away.  Massively better.  Like night and day.

    * The all-free case can be detected and used to immediately return a
      completely-free bigblock to the free pool.  I've done some testing
      and what this means is that removing large files or medium-sized
      sub-trees WILL in fact result in some immediate gratification.
      The space freed from sparse removal and pruning will take time
      to actually become reusable as the cleaning code will have to go
      through and finish cleaning out the big-block(s) in question.

    * The cleaning code is not complicated in the least.  All it needs to
      do is scan the B-Tree and check the blockmap entries for related
      references.  If the associated big-block has greater then a certain
      percentage of space free, the cleaning code will attempt to pack
      the remaining data (as it comes across it in the B-Tree) into a new
      block.  Since the B-Tree elements and records must be manipulated
      no matter which side you approach cleaning and packing from, this is
      no more difficult then trying to reverse engineer the remaining
      contents of a big-block.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Kernel mailing list