Bill Hacker wbh at conducive.org
Mon Mar 2 21:27:14 PST 2009

Dmitri Nikulin wrote:
On Tue, Mar 3, 2009 at 1:08 PM, Mag Gam <magawake at gmail.com> wrote:
I was wondering if HAMMER will ever have network based RAID 5. After
researching several file systems it seems HAMMER is probably  the
closest to achieve this problem and will make HAMMER a pioneer.
Intuitively I highly doubt network RAID5 is worth it. Even local disk
RAID5 is unusable for many work loads.
In contrast, check out some of the more flexible RAID10 modes
available in Linux:
You can get N/M effective space (N raw storage / M copies) with
RAID0-like striping for all of it. It performs very well and certainly
much better than the parity-based RAID5.
Imagine how RAID5 would work with network devices:

Read old data block from one server
Read parity block from another server
Generate new parity block
Write data block to one server
Write parity block to another server
All with NO atomicity guarantees, so HAMMER would have to pick up the
slack. Even in the best case you have 8x the latency of a single trip
to a machine (4 request/response pairs of 2 IOs each). All compared to
a one round trip (2 IOs) to write to a plain slave, or N round trips
for N redundant copies. What is an acceptable penalty on local disks
is pretty heavy for network storage.
If you really want, you can use vinum over iSCSI to get networked
RAID5, but it will not perform well.
Adding to that (as we have spent the past 12+ months researching all this..)

- there IS prior art, and lots of it. [1]

- none of it is fast - even over local 'Infiniband'

- the most practical compromise seems to be deferred background 
replication to 'pools' that are themselves *hardware8 RAID5 (6 or 10).

- 'hammer mirror-stream', especially if done over something faster than 
ssh, - eg: localy over 10GigE, iSCSI, or e-SAt over raw Ethernet, is a 
primo candidate for having at least one rapid-restoration near-real-time 

But at the present state of the art, HAMMER is challenged w/r quotas, 
subvolume-only selective replication, and r/w mounting of the mirrored 

Quite possibly there will be no 'one size fits all' solution, Too many 
compromises that pull in opposing directions.

As has always been the case......



[1] Start with the Wikipedia article on distributed file systems, 
paticularly replicated and fault-tolerant.

Most are either IBM/Sun/Oracle/$AN-vendor, 'mainframe & big-bucks' 
class, ELSE Linux whole-damn-world-in-the kernel wannabees.

Among the contenders:

- Gluster (problematic getting it to work with fuse on FreeBSD)

- GFarm (wants to link in its own utils)

- MooseFS (compiles sweetly on FreeBSD - but sparse docs)

- Chiron (dirt-simple, but needs manual work if/as/when backends break)

- Ceph (relies on btrfs - which is scary as the btrfs developers claim 
'not ready yet..'

Aside from Ceph, most of the others I mention use 'any POSIX fs' for 
eventual store.

Chiron, to name one of many, expects those to be already-mounted smbfs 
or NFS mounts.

AFAIK, 'POSIX' compatibility includes HAMMER fs, whether over sshfs 
sshftp, NFS, SMBFs, or ...

so ...... 'possibilites abound'.

Speaking from the transpacific fiber private-network alpha test 
exposure, there ain't no magic to the network, though!

What folks forget is that the delays introduced by each router or switch 
add up - even at 'light speed' to latency 'puters do not like.

One can hope for paired electron technology.... but not 'soon'


More information about the Users mailing list