failing disk, or not?

Bill Hacker wbh at conducive.org
Mon Mar 28 20:25:04 PST 2005


George Georgalis wrote:

I'm seeing some disk errors in dfly that I cannot reproduce with other
OS checking the partition:
ad4: UDMA ICRC error writing fsbn 249842603 of 110387664-110387679 (ad4 bn 249842603; cn 15551 tn 250 sn 38) retrying
ad4: UDMA ICRC error writing fsbn 249842603 of 110387664-110387679 (ad4 bn 249842603; cn 15551 tn 250 sn 38) retrying
ad4: UDMA ICRC error writing fsbn 249842603 of 110387664-110387679 (ad4 bn 249842603; cn 15551 tn 250 sn 38) retrying
ad4: UDMA ICRC error writing fsbn 488278315 of 229605520-229605535 (ad4 bn 488278315; cn 30393 tn 234 sn 28) retrying
ad4: UDMA ICRC error writing fsbn 488278315 of 229605520-229605535 (ad4 bn 488278315; cn 30393 tn 234 sn 28) retrying
ad4: UDMA ICRC error writing fsbn 488278443 of 229605584-229605599 (ad4 bn 488278443; cn 30393 tn 236 sn 30) retrying
This happened while running dvdbackup and I reproduced it running
a dd read from the partition. However, after several attempts I cannot
reproduce it from Linux badblocks (read or non-distructive write) check
or linux dd read from the partition. I know failures can be intermetint
But not getting any errors at all yet, from Linux, seems odd at this
point, if the disk is really failing.
Might DFLY be attempting I/O beyond the permitted
end of the assigned area?  Or to an area that Linux
is not trying to access?
# df -h
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad4s3a   248M   122M   106M    54%    /
/dev/ad4s3d   248M   1.3M   227M     1%    /var
/dev/ad4s3e   124G    94G    20G    83%    /usr
procfs        4.0K   4.0K     0B   100%    /proc
A bit of history, I did have a system lockup -- I could switch virtural
terminals but no keyboard input was accepted -- a week or two ago,
didn't file bug because I was half-hazard experimenting (in user space)
and couldn't explain well enough, at the time all I was doing, now I
don't even remember. A fsck was required, and with a 95Gb /usr, that
took quite a while. (welcome comments on why softupdates didn't help
here),
Best case, SU just leave data in an earlier state rather than
half-committed.  More transaction-oriented than jornalling.
fsck -y doesn't care about the content of data - only about its
proper file indexing, so *maybe* some time saved during
a 'preen', but no savings at all with fsck -y.
> also the /usr partition was near or over 100% capacity, but I
never got disk full errors, ie didn't *completely* run out of space.

It normally has around a 10% reserve, will usually stand 102% before it
even throws an error message.
At this point can I be sure my disk is failing or could there be some
driver instability? The full dmesg is below.
Don't see it in dmesg, but ad4 is a 200Gb Seagate drive, on a nvidia
sata controler.  Disk Product Number ST3200822AS, Part Number 9W2854-301
Thanks,
// George

Cutting ...
agp0: <NVIDIA Generic AGP Controller> mem 0xe0000000-0xe3ffffff at device 0.0 on pci0
agp0: Unable to find NVIDIA Memory Controller 1.
Unable? That's odd ?

device_probe_and_attach: agp0 attach returned 19
isab0: <PCI to ISA bridge (vendor=10de device=00e0)> at device 1.0 on pci0
isa0: <ISA bus> on isab0
pci0: <unknown card> (vendor=0x10de, dev=0x00e4) at 1.1 irq 10
NVIDIA  - nForce3 250 SMBus Controller ?

*SNIP*

atapci0: <Generic PCI ATA controller> port 0xf000-0xf00f at device 8.0 on pci0
ata0: at 0x1f0 irq 14 on atapci0
installed MI handler for int 14

ata1: at 0x170 irq 15 on atapci0
installed MI handler for int 15
atapci1: <Generic PCI ATA controller> port 0xec00-0xec7f,0xeb00-0xeb0f,0xb70-0xb73,0x970-0x977,0xbf0-0xbf3,0x9f0-0x9f7 irq 11 at device 10.0 on pci0

ata2: at 0x9f0 on atapci1
installed MI handler for int 11

ata3: at 0x970 on atapci1
*snip*

ad0: 58644MB <Maxtor 6Y060L0> [119150/16/63] at ata0-master BIOSDMA

ad4: DMA limited to UDMA33, non-ATA66 cable or device
ad4: 190782MB <ST3200822AS> [387621/16/63] at ata2-master BIOSDMA
I'm puzzled:

- ata0-master claims /dev/ad0

- ata1-master claims /dev/acd0

- ata2-master claims /dev/ad4

- ata3 seems empty...

So how do we skip /dev/ad1, /dev/ad2, and /dev/ad3 to arive at /dev/ad4?

Mounting root from ufs:/dev/ad4s3a

ad4: UDMA ICRC error writing fsbn 249842603 of 110387664-110387679 (ad4 bn 249842603; cn 15551 tn 250 sn 38) retrying
ad4: UDMA ICRC error writing fsbn 249842603 of 110387664-110387679 (ad4 bn 249842603; cn 15551 tn 250 sn 38) retrying
ad4: UDMA ICRC error writing fsbn 249842603 of 110387664-110387679 (ad4 bn 249842603; cn 15551 tn 250 sn 38) retrying
ad4: UDMA ICRC error writing fsbn 488278315 of 229605520-229605535 (ad4 bn 488278315; cn 30393 tn 234 sn 28) retrying
ad4: UDMA ICRC error writing fsbn 488278315 of 229605520-229605535 (ad4 bn 488278315; cn 30393 tn 234 sn 28) retrying
ad4: UDMA ICRC error writing fsbn 488278443 of 229605584-229605599 (ad4 bn 488278443; cn 30393 tn 236 sn 30) retrying

You are on slice2, presumably well up in the cylinder count.
Might the areas above be a geometry mapping conflict?
Bill





More information about the Bugs mailing list