RAID 1 or Hammer

Matthew Dillon dillon at apollo.backplane.com
Mon Jan 12 19:18:40 PST 2009


    I've seen uncaught data corruption on older machines, but not in the
    last few years.  Ah, the days of IDE cabling problems, remembered
    fondly (or not).  I've seen bad data get through TCP connections
    uncaught!  Yes, it actually does happen, even more so now that OS's
    are depending more and more on CRC checking done by the ethernet device.

    ZFS uses its integrity check data for a lot more then simple validation.
    It passes the information down into the I/O layer and this allows the
    I/O layer (aka the software-RAID layer) to determine which underlying
    block is the correct one when presented with multiple choices.  So, for
    example, if data is mirrored the ZFS I/O layer can determine which of
    the mirrored blocks is valid... A, B, both, or neither.

    People have debunked Sun's tests as pertaining to a properly functioning
    RAID system.  But ZFS also handles any Black Swan that shows up in the
    entire I/O path.  A Black Swan is an unexpected condition.  For example,
    an obscure software bug in the many layers of firmware that the data
    passes through.  Software is so complex these days there are plenty of
    ways the data can get lost or corrupted without necessarily causing
    actual corruption at the physical layer.

						-Matt






More information about the Users mailing list