RAID 1 or Hammer
Matthew Dillon
dillon at apollo.backplane.com
Mon Jan 12 19:18:40 PST 2009
I've seen uncaught data corruption on older machines, but not in the
last few years. Ah, the days of IDE cabling problems, remembered
fondly (or not). I've seen bad data get through TCP connections
uncaught! Yes, it actually does happen, even more so now that OS's
are depending more and more on CRC checking done by the ethernet device.
ZFS uses its integrity check data for a lot more then simple validation.
It passes the information down into the I/O layer and this allows the
I/O layer (aka the software-RAID layer) to determine which underlying
block is the correct one when presented with multiple choices. So, for
example, if data is mirrored the ZFS I/O layer can determine which of
the mirrored blocks is valid... A, B, both, or neither.
People have debunked Sun's tests as pertaining to a properly functioning
RAID system. But ZFS also handles any Black Swan that shows up in the
entire I/O path. A Black Swan is an unexpected condition. For example,
an obscure software bug in the many layers of firmware that the data
passes through. Software is so complex these days there are plenty of
ways the data can get lost or corrupted without necessarily causing
actual corruption at the physical layer.
-Matt
More information about the Users
mailing list