Plans for 1.8+ (2.0?)
dillon at apollo.backplane.com
Tue Feb 13 14:05:31 PST 2007
:Are you also considering redundancy beyond basic mirroring? Over an
:unreliable network, it would be desirable to mirror data at least
:three ways, and this gets very expensive storage wise. While it is
:computationally expensive today, I think it would be very useful to
:support Reed-Solomon ECC blocks. Computation is cheaper than network
:I/O in many cases, and will only become more so in the future.
:Also, for anonymous clustering, encryption seems like a necessity as
:well. Or, at least something that should be considered in the design
Well, as a replacement for something like RAID-5 then, yes, it
would be doable. Frankly, though, hard drive capacities are such
that it is almost universally better to mirror the data now then
to add ECC or PARITY based redundancy. Hard drive capacities will
increase by another 10x in the next 5 years.
:However, it would be good to have an option to require that data be on
:redundant storage before returning. (At least two copies, perhaps one
:on another cluster node on the local LAN.) After this is done,
:perhaps you could distribute the data and ECC blocks to several
:machines across the network.
ECC blocks wouldn't help here. A data integrity hash, sure, but
not an ECC block. Data stored on a hard drive is already ECC'd
internally, so you just don't see the sort of correctable corruption
over the wire any more. The only type of bit corruption people see
now occurs when the DMA hardware is broken (ATA most typically has
this problem), in which case simply re-reading the data is the solution.
One could require require synchronization to more then one physical
media target before allowing an fsync() to return, but it the performance
hit would be rather servere.
<dillon at backplane.com>
More information about the Kernel