HAMMER filesystem update - design document
Matthew Dillon
dillon at apollo.backplane.com
Wed Oct 10 15:41:53 PDT 2007
:So, the filesystem is going to be the volume manager as well (like in
:ZFS), right? Will filesystems strictly be bounded to 'partitions' or
:'slices'?
:
:Another question: will this mirroring capability allow for an FS-level
:RAID like RAIDZ? I wonder whether the filesystem can be extended so it
:can achieve this.
:
:Disclaimer: yes, those are ZFS features which I am asking about, bot
:no, I don't want a cluster-friendly ZFS ripoff, just asking.
:
:--
:Gergo Szakal MD <bastyaelvtars at gmail.com>
No, it isn't a volume manager, it's simply that the filesystem
can be made up of multiple volumes. Each cluster (say, a 256M chunk)
is integrated into the filesystem-wide B-Tree and can only be addressed
by its parent or by the parent pointers of its children. This means
that clusters can be migrated with minimal work and thus can be migrated
while the filesystem is live. We don't have the situation such as we
have in UFS where random inodes in the filesystem directly reference
random data blocks elsewhere in the filesystem.
For example, if you had a HAMMER filesystem backed by two volumes you
could add a third volume, migrate all the data from the first volume
to the new volume, and then remove the first volume (make it not part
of the filesystem any more). Similarly you could migrate the clusters
at the end of a volume elsewhere and then contract that volume, or
you could expand a volume and tell HAMMER to use the new space.
I am not going to try to implement RAID inside HAMMER when RAID can be
done with a software or hardware solution in another layer.
HAMMER will do what hardware and software storage solutions can't
easily or efficiently do, which is logical replication of the entire
filesystem. A logical replication allows the different replication
targets to retain varying amounts of filesystem history. For
example, your production filesystem might retain 30 second snapshots
for an hour and hourly for the day, while one of your replication
targets might retain hourly snapshots for a day and daily snapshots
for a month, etc.
Ultimately we will have a multi-master environment which will silently
handle whole or partial filesystem failures. In this case the type
of redundancy you need at the storage layer will depend on the number
of physical disks you need to use for each copy of the filesystem. If
your filesystem fits on one or two physical disks then you wouldn't
need any RAID at all. If each copy needs a bank of physical disks then
you might want the bank of disks to be RAIDed. At that point you'd
use a hardware or software RAID solution.
But is RAID absolutely necessary? Probably not. Consider a replicated
filesystem with each copy backed by an array of disks. Now say you
have a disk failure. The copy of the filesystem containing the disk
failure loses a portion of its B-Tree. It doesn't need to recover
the disk, you would just pull it and slap in a new one and the
filesystem would reload that portion of the B-Tree from one of the
other replicated copies to repair itself.
:University Of Szeged, HU
:Faculty Of General Medicine
:
:/* Please do not CC me with replies, thank you. */
:
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Kernel
mailing list