HAMMER and RAID 5
wbh at conducive.org
Mon Mar 2 21:27:14 PST 2009
Dmitri Nikulin wrote:
On Tue, Mar 3, 2009 at 1:08 PM, Mag Gam <magawake at gmail.com> wrote:
I was wondering if HAMMER will ever have network based RAID 5. After
researching several file systems it seems HAMMER is probably the
closest to achieve this problem and will make HAMMER a pioneer.
Intuitively I highly doubt network RAID5 is worth it. Even local disk
RAID5 is unusable for many work loads.
In contrast, check out some of the more flexible RAID10 modes
available in Linux:
You can get N/M effective space (N raw storage / M copies) with
RAID0-like striping for all of it. It performs very well and certainly
much better than the parity-based RAID5.
Imagine how RAID5 would work with network devices:
Read old data block from one server
Read parity block from another server
Generate new parity block
Write data block to one server
Write parity block to another server
All with NO atomicity guarantees, so HAMMER would have to pick up the
slack. Even in the best case you have 8x the latency of a single trip
to a machine (4 request/response pairs of 2 IOs each). All compared to
a one round trip (2 IOs) to write to a plain slave, or N round trips
for N redundant copies. What is an acceptable penalty on local disks
is pretty heavy for network storage.
If you really want, you can use vinum over iSCSI to get networked
RAID5, but it will not perform well.
Adding to that (as we have spent the past 12+ months researching all this..)
- there IS prior art, and lots of it. 
- none of it is fast - even over local 'Infiniband'
- the most practical compromise seems to be deferred background
replication to 'pools' that are themselves *hardware8 RAID5 (6 or 10).
- 'hammer mirror-stream', especially if done over something faster than
ssh, - eg: localy over 10GigE, iSCSI, or e-SAt over raw Ethernet, is a
primo candidate for having at least one rapid-restoration near-real-time
But at the present state of the art, HAMMER is challenged w/r quotas,
subvolume-only selective replication, and r/w mounting of the mirrored
Quite possibly there will be no 'one size fits all' solution, Too many
compromises that pull in opposing directions.
As has always been the case......
 Start with the Wikipedia article on distributed file systems,
paticularly replicated and fault-tolerant.
Most are either IBM/Sun/Oracle/$AN-vendor, 'mainframe & big-bucks'
class, ELSE Linux whole-damn-world-in-the kernel wannabees.
Among the contenders:
- Gluster (problematic getting it to work with fuse on FreeBSD)
- GFarm (wants to link in its own utils)
- MooseFS (compiles sweetly on FreeBSD - but sparse docs)
- Chiron (dirt-simple, but needs manual work if/as/when backends break)
- Ceph (relies on btrfs - which is scary as the btrfs developers claim
'not ready yet..'
Aside from Ceph, most of the others I mention use 'any POSIX fs' for
Chiron, to name one of many, expects those to be already-mounted smbfs
or NFS mounts.
AFAIK, 'POSIX' compatibility includes HAMMER fs, whether over sshfs
sshftp, NFS, SMBFs, or ...
so ...... 'possibilites abound'.
Speaking from the transpacific fiber private-network alpha test
exposure, there ain't no magic to the network, though!
What folks forget is that the delays introduced by each router or switch
add up - even at 'light speed' to latency 'puters do not like.
One can hope for paired electron technology.... but not 'soon'
More information about the Users