RAID 1 or Hammer

Bjørn Vermo bv at
Wed Jan 14 02:50:10 PST 2009

On 14. jan.. 2009, at 09.32, Simon 'corecode' Schubert wrote:

Bjørn Vermo wrote:
 I've seen uncaught data corruption on older machines, but not in  
 last few years.  Ah, the days of IDE cabling problems, remembered
 fondly (or not).  I've seen bad data get through TCP connections
 uncaught!  Yes, it actually does happen, even more so now that OS's
 are depending more and more on CRC checking done by the ethernet  
Modern (meaning anything with an ATA or SCSI controller in it)  
drives will do so much error checking and recovery that the time  
between externally noticeable failures and total breakdown will be  
very short.
This seems to fail sometimes.  Recent work [1] has shown that silent  
data corruption on HDDs is larger than expected.

Interesting report, but I do not think 0.06% of drives indicates a  
serious problem. That could be my perspective - I come from a past  
where 14" disks in replaceable open stacks were the norm. What  
surprises me is that his findings in some ways are the opposite of  
what Google have reported, especially that SATA drives had an order of  
magnitude more errors than SCSI (FC) drives. It matches my experience  
with ATA drives, though.

Since this study was done using one brand of RAID hardware and  
interconnects, one may wonder how many of the errors are due to issues  
with hardware and firmware external to the drives.

His comments on location-specific error-proneness in certain drive  
models does not come as a surprise. I have toyed with the idea that  
one drive in a mirrored pair ought to have the addresses inverted, so  
sector 1 on one drive is mapped to sector MAX -1 on the other.

Bjørn Vermo
Core networking
Opera Software ASA

More information about the Users mailing list