HAMMER update 06-Feb-2008
Matthew Dillon
dillon at apollo.backplane.com
Thu Feb 7 12:03:40 PST 2008
:Now combine that with 64bit block idents and you have a "quasi infinte=20
:FIFO" :)
:
:cheers
: simon
Yup. It turned out to be surprisingly easy to replace the cluster stuff
with the FIFO and change the B-Tree to use full 64 bit offsets.
Amazingly easy, I'll be able to commit the major surgery tonight (it
already works except for the cleaning code and the undo records). I
added a hammer_off_t type which is broken down like this:
[vol_no:8][sanity:1][offset:55] (sanity bit is always 1 to catch
programming errors).
That allows 32768 TB per volume. I will probably steal a few more bits
to abstract out a blockmap layer, but I can't imagine needing more then
another 7 bits, giving us 256 TB per volume (x 256 volumes == 65536 TB).
I've already learned a few things. With my first FIFO implementation,
where I am currently just laying things down willy nilly so data,
B-Tree nodes, and records are all mixed together linearly, My
'rm -rf /mnt' runs are a lot slower then before due to the lack of
locality of reference in the B-Tree and record headers.
My preliminary work on the cleaning code shows some performance
issues too, primarily the fact that when a node or record must be
moved from one location to another, all the related linkages (B-Tree
and record linkages) have to be adjusted as well. This argues for
using a 'named block' mechanic and having the FIFO be a blockmap of
named blocks. That way whole named blocks can be moved from the
front of the FIFO to the end without modification if they otherwise
do not need to be cleaned.
Multiple virtualized FIFOs may be the answer to the locality of
reference issue. One FIFO for records, one for B-Tree nodes, and
one for DATA. Unlike the cluster mechanic, the indexes for a limited
number of virtualized FIFOs can all be atomically updated in a single
write of the volume header.
We have plenty of bits in the 64 bit hammer offset to abstract out
both named blocks and virtualized fifos.
I'm already a lot happier with the FIFO mechanic then I was with the
cluster mechanic. It's pretty clear to me now that not only do we
want to use the FIFO mechanic, but we also want to use named blocks
and move those around instead of shifting the actual data, and
we definitely want more then one FIFO.
The really, really cool thing about this is that I can finish the
user-visible feature work and have the only bad thing be the known
performance issues, then work on the performance issues as a separate
task.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Kernel
mailing list