HAMMER update 10-Feb-2008
Matthew Dillon
dillon at apollo.backplane.com
Sun Feb 10 13:21:39 PST 2008
HAMMER is really shaping up now. Here's what works now:
* All filesystem operations
* All historical operations
* All Pruning features
Here's what is left:
* freemap code (allocate and free big-blocks, which are 8MB blocks).
Currently a hack so everything else can be tested, nothing is
actually freed.
* undo fifo and related recovery code. Most of the API calls are
in place, the back-end buffer reservation, flushes, and recovery
need to be implemented.
* big-block cleaning code (this is different from the pruning code).
* Structural locking. The B-Tree is fine-grained locked but the
locks for the blockmap are just a hack (one big lock).
These are all fairly low difficulty items, most of the infrastructure
needed to support their function is already in place and the FIFO
infrastructure has already been tested (just not mapped onto a blockmap
yet).
I have already run some tests with regards to the blockmap allocation
model and it looks very good. What I did was implement an array of
blockmap entry structures rather then just an array of pointers to the
actual physical big-blocks. The blockmap entry structure not only has
a pointer to the underlying physical big-block, it also has a
bytes_free field which specifies how many bytes in the underlying
big-block are free.
This is the only tracking done by the blockmap. It does not actually
try to track WHERE in the big-block the free areas are... figuring
that out will be up to the cleaning code. What this gives us is the
following:
* Extremely fast freeing of on-disk storage elements. The target
physical block doesn't have to be read or written, only the governing
blockmap entry. With 8MB big-blocks and 32-byte blockmap entries one
16K buffer can track 4GB worth of underlying storage, which means
that freeing large amounts of sparse information does not cause the
disk to seek all over the place.
This is far, FAR better then the cluster model I was using last week
and had to throw away. Massively better. Like night and day.
* The all-free case can be detected and used to immediately return a
completely-free bigblock to the free pool. I've done some testing
and what this means is that removing large files or medium-sized
sub-trees WILL in fact result in some immediate gratification.
The space freed from sparse removal and pruning will take time
to actually become reusable as the cleaning code will have to go
through and finish cleaning out the big-block(s) in question.
* The cleaning code is not complicated in the least. All it needs to
do is scan the B-Tree and check the blockmap entries for related
references. If the associated big-block has greater then a certain
percentage of space free, the cleaning code will attempt to pack
the remaining data (as it comes across it in the B-Tree) into a new
block. Since the B-Tree elements and records must be manipulated
no matter which side you approach cleaning and packing from, this is
no more difficult then trying to reverse engineer the remaining
contents of a big-block.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Kernel
mailing list