I/O errors on Hammer volume
dillon at apollo.backplane.com
Tue Apr 13 11:47:34 PDT 2010
:I've just found this in the daily security output of my workstation
:(running 2.6.1) :
:Checking setuid files and devices:
:find: /usr/pkgsrc/sysutils/sysinfo/CVS/Entries: Input/output error
:find: /usr/pkgsrc/devel/avltree/CVS/Entries: Input/output error
:find: /usr/pkgsrc/security/p5-Crypt-RSA/CVS/Entries: Input/output error
:find: /usr/pkgsrc/ham/dpbox/CVS/Entries.Log: Input/output error
:find: /usr/pkg/openoffice.org: Input/output error
:find: /usr/pkg/openoffice.org3: Input/output error
:These directories can no longer be used:
:# cd /usr/pkg
:# ls -ld o*
:ls: openoffice.org: Input/output error
:ls: openoffice.org3: Input/output error
:I'm not sure if this is due to a bug in HAMMER or to some sort of hardware
:failure. The hard disk is a 500GB Western Digital Green model with a single
:hammer version 4 volume. The directories belong to the root fs.
:Filesystem Size Used Avail Capacity Mounted on
:PFS500GP 460G 400G 60G 87% /
:There's no relevant error message in dmesg.
:Is there any tool I can use to diagnose and fix this problem ?
This is the first time I've seen this sort of problem reported.
There are two possibilities. First is it is a real I/O error, but
in that case I would have expected messages in the dmesg output.
The second is the HAMMER volume somehow got corrupted... HAMMER
will generate EIO if records in the B-Tree specify offsets which
are out of bounds. If it were real corruption the CRCs would
fail and you would get dmesg output. If something else happened
that caused it to lay down good records that point to bad data
then it could report EIO without generating console output.
One thing you can do is see if the disk is completely readable
by using dd to read the raw drive. It should be able to read the
whole disk until it hits the end of the disk:
dd if=/dev/blahblah of=/dev/null bs=32k
But beyond that, think back about things you might have done
in the past that is outside the norm for HAMMER operations
that might have caused the problem, such as add or remove
a volume or adjust the disklabel or something like that.
<dillon at backplane.com>
More information about the Kernel