HAMMER filesystem update - design document

Thu Oct 11 01:07:20 PDT 2007

Matthew Dillon schrieb:
:This is the functional equivalent of a RAID1, and that is all HAMMER 
:provides; the point of RAIDZ (and RAID3,4,5,6,etc) is that you don't 
:need 2n bytes worth of disk for n bytes worth of usable storage, yet 
:keeping some level of resilience. There is something to be said for this 
:kind of scheme, namely not wasting as much disk space, but in the case 
:of RAID1,0,10,01, moving that to a different layer (e.g. Vinum) is good 
:enough.

    Yes and no.  The reason it isn't quite the same is that RAID storage
    has no ability to recovery corruption generated by the filesystem
    code itself or corruption caused by other parts of the kernel or by
    hardware snafus which occur prior to the data getting onto the platter.
    When you do logical replication, however, the possibility of this sort of
    corruption seeping into all the replicated copies is greatly reduced
    and the replicated copies can check against each other to detect
    even more such cases.  So with replication you get a degree of detection
    plus the ability to recover (correct) the corrupted data.
    Also one always has one and possibly several backups, both on-site
    and off-site.  A standard RAID system does not give you a functional
    backup of your data, it just gives you redundancy.   Replication
    coupled with HAMMER's historical data store gives you a functional
    backup AND replication at the same time, without having to add yet
    more physical storage.  That is a big deal.
:In a clustering environment, it's not likely that you'll want anything 
:other than full replication, but at least on single-node storage 
:systems, using storage more efficiently has its uses; even though it 
:means longer recovery times.
:
:Cheers,
:-- 
:         Thomas E. Spanjaard

    This is something I have been thinking about.  It would be possible
    to replicate just a portion of a filesystem but doing it properly would
    require HAMMER to support a 'filesystem within a filesystem' abstraction
    in order to be able to use the same object ids in the replicated subset
    that the originator used.
    Even though only a subset of files are being replicated the target must
    be able to store objects across the source's entire object id space.
    So what you want to do is create a filesystem within the target's
    filesystem to hold the replication of the subset.
    e.g. something like this (pseudo code):

	mkfilesystem /hammer/my_source_backup
	replicate /elsewhere/my_source /hammer/my_source_backup
	mkfilesystem /hammer/my_pictures_backup
	replicate /elsewhere/my_pictures /hammer/my_pictures_backup
This seems pretty useful to me (backup /home but not /usr), but can't 
the same be achieved by using multiple partitions and mounting them
into a big filesystem?

I tried to achieve something similar with mountctl and jscan, but it 
didn't worked:

  # only one root partition which contains everything!

  mount_nullfs /home /mnt/fs_to_backup

  mountctl -a -w /tmp/journal /mnt/fs_to_backup:test

Seems like mountctl is not able to work with nullfs-mounted filesystems.

Regards,

  Michael