more context - Re: nasty dev strategy related ata-raid bug

Andrew Atrens atrens at nortelnetworks.com
Mon Jul 19 08:41:42 PDT 2004


Hi Folks,

I didn't have time to elaborate much on this last night folks...
Sorry for the lack of context. ;)
I upgraded my home box from 4.9 to DragonFly 1.0a and on boot of
the DragonFly kernel the ata-raid subsystem marked my raid array as
broken and refused to mount it. It then wrote out its findings
to the RAID config on disk and on the next reboot the BIOS
announced that the array was severely damaged and wouldn't allow
me to boot from it. My only recourse was to use the BIOS to re-create
the array and rebuild my partition tables from scratch.
This would be a pretty HARSH introduction to DragonFly for a newbie :(
Fortunately with a bit of help from Highpoint Tech, I managed to figure
out my RAID settings and rebuilt the array quickly. Then someone on the
FreeBSD-stable list suggested scan_ffs. That tool is a real lifesaver
for rebuilding partition tables!!!  At any rate I'm happy to say that
it took me almost 5 days, but I did fully recover the system (am typing
on it now).
The problem lies in ata-raid.c. There's a check on si_disk that's
returning NULL. Why does ata-raid even care about si_disk ?  Well, in
FreeBSD AR_STRATEGY was defined as -
#define AR_STRATEGY(x) (x)->b_dev->si_disk->d_devsw->d_strategy((x))

In DragonFly it's -

#define AR_STRATEGY(x)     dev_dstrategy((x)->b_dev, x)

So, near as I can tell si_disk isn't useful to ata-raid anymore. Well,
in DragonFly at least.
My DragonFly 'fix' was to `#if 0' out the block in ata-raid.c that does
a sanity check on si_disk, as now it always fails, and when it does the
cascading events will trash your RAID config :( :( ...
I think you folks should fix ASAP and warn people about this one.

Andrew.



Andrew Atrens wrote:
This one is pretty nasty. In my case it caused ata-raid to mark my raid 
array broken, then write that config out to the array. On the subsequent
reboot the highpoint bios wouldn't boot because the raid had been marked 
damaged. The only way forward was to destroy and recreate the array 
using the same settings and rebuild the partition tables from scratch ( 
a task made less harrowing by scan_ffs ).

Sorry about the line wrapping, my email client is having a bad day ;)

Andrew.

#if 0
            /*
             *  Very Harmful
             */
            if ((rdp->disks[buf1->drive].flags &
(AR_DF_PRESENT|AR_DF_ONLINE))==(AR_DF_PRESENT|AR_DF_ONLINE) &&
                !AD_SOFTC(rdp->disks[buf1->drive])->dev->si_disk) {
                rdp->disks[buf1->drive].flags &= ~AR_DF_ONLINE;
                change = 1;
            }
            if ((rdp->disks[buf1->drive + rdp->width].flags &
(AR_DF_PRESENT|AR_DF_ONLINE))==(AR_DF_PRESENT|AR_DF_ONLINE) &&
                !AD_SOFTC(rdp->disks[buf1->drive + 
rdp->width])->dev->si_disk) {
                rdp->disks[buf1->drive + rdp->width].flags &= 
~AR_DF_ONLINE;
                change = 1;
            }
#endif







More information about the Bugs mailing list