HAMMER filesystem update - design document
mneumann at ntecs.de
Thu Oct 11 01:07:20 PDT 2007
Matthew Dillon schrieb:
:This is the functional equivalent of a RAID1, and that is all HAMMER
:provides; the point of RAIDZ (and RAID3,4,5,6,etc) is that you don't
:need 2n bytes worth of disk for n bytes worth of usable storage, yet
:keeping some level of resilience. There is something to be said for this
:kind of scheme, namely not wasting as much disk space, but in the case
:of RAID1,0,10,01, moving that to a different layer (e.g. Vinum) is good
Yes and no. The reason it isn't quite the same is that RAID storage
has no ability to recovery corruption generated by the filesystem
code itself or corruption caused by other parts of the kernel or by
hardware snafus which occur prior to the data getting onto the platter.
When you do logical replication, however, the possibility of this sort of
corruption seeping into all the replicated copies is greatly reduced
and the replicated copies can check against each other to detect
even more such cases. So with replication you get a degree of detection
plus the ability to recover (correct) the corrupted data.
Also one always has one and possibly several backups, both on-site
and off-site. A standard RAID system does not give you a functional
backup of your data, it just gives you redundancy. Replication
coupled with HAMMER's historical data store gives you a functional
backup AND replication at the same time, without having to add yet
more physical storage. That is a big deal.
:In a clustering environment, it's not likely that you'll want anything
:other than full replication, but at least on single-node storage
:systems, using storage more efficiently has its uses; even though it
:means longer recovery times.
: Thomas E. Spanjaard
This is something I have been thinking about. It would be possible
to replicate just a portion of a filesystem but doing it properly would
require HAMMER to support a 'filesystem within a filesystem' abstraction
in order to be able to use the same object ids in the replicated subset
that the originator used.
Even though only a subset of files are being replicated the target must
be able to store objects across the source's entire object id space.
So what you want to do is create a filesystem within the target's
filesystem to hold the replication of the subset.
e.g. something like this (pseudo code):
replicate /elsewhere/my_source /hammer/my_source_backup
replicate /elsewhere/my_pictures /hammer/my_pictures_backup
This seems pretty useful to me (backup /home but not /usr), but can't
the same be achieved by using multiple partitions and mounting them
into a big filesystem?
I tried to achieve something similar with mountctl and jscan, but it
# only one root partition which contains everything!
mount_nullfs /home /mnt/fs_to_backup
mountctl -a -w /tmp/journal /mnt/fs_to_backup:test
Seems like mountctl is not able to work with nullfs-mounted filesystems.
More information about the Kernel