HAMMER filesystem update - design document

Wed Oct 10 15:41:53 PDT 2007

:So, the filesystem is going to be the volume manager as well (like in
:ZFS), right? Will filesystems strictly be bounded to 'partitions' or
:'slices'?
:
:Another question: will this mirroring capability allow for an FS-level
:RAID like RAIDZ? I wonder whether the filesystem can be extended so it
:can achieve this.
:
:Disclaimer: yes, those are ZFS features which I am asking about, bot
:no, I don't want a cluster-friendly ZFS ripoff, just asking.
:
:-- 
:Gergo Szakal MD <bastyaelvtars at gmail.com>

    No, it isn't a volume manager, it's simply that the filesystem
    can be made up of multiple volumes.  Each cluster (say, a 256M chunk)
    is integrated into the filesystem-wide B-Tree and can only be addressed
    by its parent or by the parent pointers of its children.  This means
    that clusters can be migrated with minimal work and thus can be migrated
    while the filesystem is live.  We don't have the situation such as we
    have in UFS where random inodes in the filesystem directly reference
    random data blocks elsewhere in the filesystem.

    For example, if you had a HAMMER filesystem backed by two volumes you
    could add a third volume, migrate all the data from the first volume
    to the new volume, and then remove the first volume (make it not part
    of the filesystem any more).  Similarly you could migrate the clusters
    at the end of a volume elsewhere and then contract that volume, or
    you could expand a volume and tell HAMMER to use the new space.

    I am not going to try to implement RAID inside HAMMER when RAID can be
    done with a software or hardware solution in another layer.

    HAMMER will do what hardware and software storage solutions can't
    easily or efficiently do, which is logical replication of the entire
    filesystem.  A logical replication allows the different replication
    targets to retain varying amounts of filesystem history.  For
    example, your production filesystem might retain 30 second snapshots
    for an hour and hourly for the day, while one of your replication
    targets might retain hourly snapshots for a day and daily snapshots
    for a month, etc.

    Ultimately we will have a multi-master environment which will silently
    handle whole or partial filesystem failures.  In this case the type
    of redundancy you need at the storage layer will depend on the number
    of physical disks you need to use for each copy of the filesystem.  If
    your filesystem fits on one or two physical disks then you wouldn't
    need any RAID at all.  If each copy needs a bank of physical disks then
    you might want the bank of disks to be RAIDed.  At that point you'd
    use a hardware or software RAID solution.

    But is RAID absolutely necessary?  Probably not.  Consider a replicated
    filesystem with each copy backed by an array of disks.  Now say you 
    have a disk failure.  The copy of the filesystem containing the disk
    failure loses a portion of its B-Tree.  It doesn't need to recover
    the disk, you would just pull it and slap in a new one and the
    filesystem would reload that portion of the B-Tree from one of the
    other replicated copies to repair itself.

:University Of Szeged, HU
:Faculty Of General Medicine
:
:/* Please do not CC me with replies, thank you. */
:

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>