syslink development update, HAMMER/ANVIL filesystem development update
Matthew Dillon
dillon at apollo.backplane.com
Mon Apr 9 11:13:39 PDT 2007
I'm making good progress on the syslink code. The system call keeps
morphing but in the next commit it should settle down.
I ran into a bit of a problem with the 'syslink route node' concept
when I decided I wanted to be able to attach a UDP broadcast socket
to a syslink route node. Originally the concept was to attach
individual connections only, but I realized that it made little sense
not to take advantage of hardware-switched broadcast mechanisms
since the leafs of most clusters were likely to be operating over
switched networks.
Attaching a broadcast socket to a syslink route node requires an
aligned subset of the route node's address space to be associated with
the socket (in order to represent the network's subnet space and be able
to directly map some of the address bits to the IP space). After
some fooling around I decided to adapt the SUBR_BLIST code to the task
and have committed SUBR_ALIST as a result. This will allow the
syslink route node to trivially attach mixed-size subnets (down to
single entities). I hope to have the syslink mesh infrastructure done
in about another week or so and will then start working on the protocols
and kernel device/filesystem namespace interfacing.
--
SUBR_BLIST is a bitmap allocator which I originally developed for FreeBSD
which manages swap space allocations. It is implemented using a radix
tree with hinting. SUBR_ALIST is the same thing, but designed for
power-of-2 allocations and guarentees alignment to the allocation size.
e.g. a 4KB allocation would be aligned to a 4KB boundary.
It turns out the SUBR_ALIST has other interesting characteristics that
make it suitable for the cluster filesystem design. And, in fact, the
design characteristics also make the ALIST allocator suitable for
managing physical memory, particularly in its ability to allow us to
cut up memory into chunks which are properly aligned no matter what
the page size used. It would be possible to manage page segment
mappings entirely dynamically if we wanted to go that direction (not
on my list, though).
--
The current filesystem design revolves around the filesystem layer,
HAMMER, and the storage layer, ANVIL.
ANVIL will be responsible for very large block reservations (e.g.
1MB to 1TB 'blocks' or even larger). These blocks will be
self-identifying and assigned to filesystems and/or logical block
devices. The key feature of ANVIL will be that since these blocks are
self-identifying, they can be moved across to different media without
upper layers knowing or caring. For example, you could construct a
filesystem based on ANVIL blocks you allocate from a single disk
drive, and on a live system migrate some or all of those ANVIL blocks
to another disk drive. You could physically move a disk drive from
one machine to another and the cluster mesh would recognize it, no
copying or reconfiguring required.
In order to properly detect ANVIL blocks on physical media, the blocks
must be aligned in a way that a scan of the physical media is able to
locate them without any other knowledge. This will be accomplished
by doing a power-of-2 scan. For example, a 256GB hard drive would be
scanned by checking the block at the 128GB mark, then all the 64GB marks,
and so on to locate the ANVIL block headers on the media and build an
allocation/management map in kernel memory. In addition, it will be
possible to allocate (and deallocate) ANVIL blocks on the fly (for
example, to resize a filesystem). Just like cutting up an address
space into variable-sized subnets, ANVIL blocks can also be variable
sized as long as they adhere to the alignment requirements.
The ANVIL layer will scan available physical media for ANVIL blocks
and piece them together to higher level entities such as filesystems.
One of the really nice things about ANVIL is that it won't care where
it gets the pieces from.
It turns out that the ALIST allocator is the perfect solution to this
problem because the ALIST alignment requirements are precisely the
same as the ANVIL alignment requirements. For example, lets say we
had a 1TB hard drive and wanted to allow 1GB-1TB ANVIL blocks to be
allocated out of it. The ALIST allocation map would thus have to
support up to 1024 1GB blocks. Out of this 1TB drive we might allocate,
say, a few dozen 1GB blocks, a few dozen 16GB blocks, maybe a large
128GB block, and so forth. The ALIST allocator would be used to do
this and eat up only around 384 bytes of kernel memory to manage the
1024 'block' space.
--
For HAMMER -- the filesystem design I am working on, I am looking
into being able to use ALISTs for extent management within the
filesystem. This research is very early, though, and I don't know
how useful ALISTs will be in HAMMER due to the time domain snapshot
mechanisms I want to implement.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Kernel
mailing list