Wed Feb 14 16:59:37 PST 2007

From: "Simon 'corecode' Schubert" <corecode at>
Subject: Re: Plans for 1.8+ (2.0?)
Date: Thu, 15 Feb 2007 01:49:34 +0100
Chris Csanady wrote:
> Yes, I was considering it as a replacement for RAID-5.  The idea being
> that, for a given filesystem block, you would divide it into
> sub-blocks and compute ECC blocks.  These would then be distributed
> across the cluster nodes.
> For example, consider a 32kB filesystem block.  Divide it into 4kB
> sub-blocks, and compute 3 4kB ECC blocks.  Now, distribute those 11
> blocks over 11 separate nodes.  Any three nodes can fail, plus space
> overhead is only 38% in this case.  To provide the same guarantee with
> mirroring would carry a 300% overhead.  While mirroring may be
> acceptable in terms of disk space, network I/O will likely be a
> problem.

How do you save on network IO there?  You have to query 8(!) boxes to ret=
rieve one block.  Okay, you might choose 8 out of 11, but that's still a =
lot.  For writing, you of course have to write to all 11.  If you go mirr=
oring, you can run the complete block from one source (you can of course =
also interleave with a mirror).  For writing, you can use multicast/broad=
cast on LAN.  That makes mirrored writes as efficient as normal writes.  =
When you do ECC, you have to write all 138%.  If you run over WAN, you wo=
n't be able to save with multicast probably, but then your block distribu=
tion will make it really hard to get a constant stream due to massive jit=

I'm not yet convinced :)  Disk space is really cheap these days.


