I/O errors on Hammer volume

Matthew Dillon dillon at apollo.backplane.com
Tue Apr 13 11:47:34 PDT 2010

:I've just found this in the daily security output of my workstation
:(running 2.6.1) :
:Checking setuid files and devices:
:find: /usr/pkgsrc/sysutils/sysinfo/CVS/Entries: Input/output error
:find: /usr/pkgsrc/devel/avltree/CVS/Entries: Input/output error
:find: /usr/pkgsrc/security/p5-Crypt-RSA/CVS/Entries: Input/output error
:find: /usr/pkgsrc/ham/dpbox/CVS/Entries.Log: Input/output error
:find: /usr/pkg/openoffice.org: Input/output error
:find: /usr/pkg/openoffice.org3: Input/output error
:These directories can no longer be used:
:# cd /usr/pkg
:# ls -ld o*
:ls: openoffice.org: Input/output error
:ls: openoffice.org3: Input/output error
:I'm not sure if this is due to a bug in HAMMER or to some sort of hardware
:failure. The hard disk is a 500GB Western Digital Green model with a single
:hammer version 4 volume. The directories belong to the root fs.
:Filesystem                    Size   Used  Avail Capacity  Mounted on
:PFS500GP                      460G   400G    60G    87%    /
:There's no relevant error message in dmesg.
:Is there any tool I can use to diagnose and fix this problem ?
:Francois Tigeot

    This is the first time I've seen this sort of problem reported.

    There are two possibilities.  First is it is a real I/O error, but
    in that case I would have expected messages in the dmesg output.
    The second is the HAMMER volume somehow got corrupted... HAMMER
    will generate EIO if records in the B-Tree specify offsets which
    are out of bounds.  If it were real corruption the CRCs would
    fail and you would get dmesg output.  If something else happened
    that caused it to lay down good records that point to bad data
    then it could report EIO without generating console output.

    One thing you can do is see if the disk is completely readable
    by using dd to read the raw drive. It should be able to read the
    whole disk until it hits the end of the disk:

	dd if=/dev/blahblah of=/dev/null bs=32k

    But beyond that, think back about things you might have done
    in the past that is outside the norm for HAMMER operations
    that might have caused the problem, such as add or remove
    a volume or adjust the disklabel or something like that.

					Matthew Dillon 
					<dillon at backplane.com>

