RAID 1 or Hammer
Bjørn Vermo
bv at opera.com
Wed Jan 14 02:50:10 PST 2009
On 14. jan.. 2009, at 09.32, Simon 'corecode' Schubert wrote:
Bjørn Vermo wrote:
I've seen uncaught data corruption on older machines, but not in
the
last few years. Ah, the days of IDE cabling problems, remembered
fondly (or not). I've seen bad data get through TCP connections
uncaught! Yes, it actually does happen, even more so now that OS's
are depending more and more on CRC checking done by the ethernet
device.
...
Modern (meaning anything with an ATA or SCSI controller in it)
drives will do so much error checking and recovery that the time
between externally noticeable failures and total breakdown will be
very short.
This seems to fail sometimes. Recent work [1] has shown that silent
data corruption on HDDs is larger than expected.
Interesting report, but I do not think 0.06% of drives indicates a
serious problem. That could be my perspective - I come from a past
where 14" disks in replaceable open stacks were the norm. What
surprises me is that his findings in some ways are the opposite of
what Google have reported, especially that SATA drives had an order of
magnitude more errors than SCSI (FC) drives. It matches my experience
with ATA drives, though.
Since this study was done using one brand of RAID hardware and
interconnects, one may wonder how many of the errors are due to issues
with hardware and firmware external to the drives.
His comments on location-specific error-proneness in certain drive
models does not come as a surprise. I have toyed with the idea that
one drive in a mirrored pair ought to have the addresses inverted, so
sector 1 on one drive is mapped to sector MAX -1 on the other.
--
Bjørn Vermo
Core networking
Opera Software ASA
More information about the Users
mailing list