More on vinum woes

Wed Sep 14 01:26:35 PDT 2005

Matthew Dillon wrote:
    I don't agree re: SCSI RAID.  It used to be true that SCSI was 
    superior not just in the reliability of the bus protocol but also
    in the actual hardware.  I remember back in the day when seagate 
    waxed poetic about all the work they did to make their SCSI drives
    more robust, and I gladly paid for SCSI drives.
Well, having worked for Seagate, maybe I'm just spouting their KoolAid 
here. :-)  But the production quality of the components which went into 
SCSI drives far exceeded those for the ATA line (which was extremely 
cost-sensitive compared to the SCSI line).

I'm not saying *you* should consider SCSI... just that if you're running 
something which requires serious uptime (as in it's unacceptable to have 
more than a second of downtime per year), you're pretty much looking at 
SCSI.  Actually, you're pretty much looking at an EMC box or somesuch, 
which will use SCSI only.  And having an EMC apps engineer on call 24/7.

It's simple statistics: if you need 99.999% uptime, then your components 
have to be much better, and you're going to pay a pretty penny for even 
marginal improvements.  (On the flip side, most people who say they need 
99.999% uptime suddenly don't when they find out just how expensive it 
is. :-)

    SATA is clearly just as reliable a bus protocol.
Yes and no.  ATA's protocol is reliable (even though the signalling 
sucks)... it's more that the chipset vendors (mostly) play fast and 
loose with the rules.  (I've been extremely disappointed by the 
deteriorating quality of chipsets and flagrant lack of testing.)

I've already seen one SATA setup go completely unreliable thanks to a 
chipset which had a tendency to freeze the north bus when attached to an 
NCQ SATA drive.

    Also, modern drives have far
    fewer moving parts and far smaller (and fewer) heads,
Smaller isn't necessarily better.  Smaller sliders (the black thing you 
can actually see) *are* good, because when they hit the disk (and they 
will, even on a disk which appears to be operating at 100%) it means 
less mass, less momentum, less debris.  The head itself, though, is also 
smaller, which is bad -- it takes less to start eroding away the GMR stripe.

I don't think there is that much of a difference in the number of moving 
parts -- in fact, IBM added more a few years back when they started 
doing load/unload of the head off the platters during power down.  (I 
think, but am not sure, that this has been pretty much replaced with 
laser texturing a zone on the platters so you can park the head there.)

>     and its hard
    to differentiate the robustness for commercial vs consumer models
    by using (e.g.) more robust materials because of that.
Keep in mind that some fancy, new robust materials end up not working 
out so well.  Generally, the newest technology goes to laptop drives 
first (where it's all about trying to squeeze as much as possible on 
those 2.5" platters), then to consumer desktops, then (once it's proven) 
to the enterprise lines.

    Software raid is a fine solution as long as your computer doesn't
    crash and as long as you have the extra cpu and bandwidth to spare.
    i.e. it can handle drive faults just fine, but it isn't so good handling
    operating system faults or power failures due to the lack of
    battery-backed cache, and it costs a lot of extra cpu to do something
    like RAID-5 in software.
Never had a crash with Vinum on FreeBSD 4.x; on Linux, it will rebuild 
the RAID array in the background after a crash.  (It's slow, but if you 
have the CPU to spare, you can probably afford to let it run overnight 
like me).

All the above in mind, when I finish configuring my new server, it'll 
use the exact setup you're describing: 3Ware SATA RAID.  (My old server 
got zapped due to a direct lightning hit to my old house days before we 
left... need to get into the new place before I get everything out of 
storage...)