File system panic on recent HEAD

Matthew Dillon dillon at apollo.backplane.com
Wed Jun 4 09:20:55 PDT 2008


:OK, the box crashed again, two times.  For the first crash I don't have
:a failure message neither a crash dump.  The second crash is as follows:
:
:Jun  4 15:47:26 pc12909 kernel: HAMMER(ad10s2) Start Recovery
:30000000006836c8 - 30000000006882a8 (19424 bytes of UNDO)(RW)
:Jun  4 15:47:26 pc12909 kernel: HAMMER(ad10s2) End Recovery
:Jun  4 15:50:03 pc12909 kernel: Debugger("CRC FAILED: DATA") called.
:Jun  4 16:02:12 pc12909 syslogd: kernel boot file is /kernel
:Jun  4 16:02:12 pc12909 kernel: Debugger("CRC FAILED: B-TREE NODE")
:called.
:Jun  4 16:02:12 pc12909 kernel: panic: node 0xc3f8b698 0000000000000000
:vs 8000000004c87400
:Jun  4 16:02:12 pc12909 kernel: Jun  4 16:02:12 pc12909 kernel: mp_lock
:...
:
:Seems its somewhat related to HAMMER?!?  The fs is not full, there is
:plenty of space left ... just in case someone wants to know:
:
:ad10s2         9.8G   2.1G   7.7G    22%    /hammer
:
:Regards
:
:	Matthias

    Not related to HAMMER, other then HAMMER is detecting that the
    information on the disk has become corrupt.

    I see three possibilities.  The most likely explanation is that
    your system memory has a hardware glitch and is becoming corrupt.
    A second possible explanation is that the disk driver's DMA is
    corrupting the data when it writes it to disk.

    The last possibility is software.  It could be that a bug in the
    kernel is being exercised by the package build, but if you were only
    running buildworld tests after creating that HAMMER filesystem
    that isn't a likely scenario.

    In all my testing of HAMMER so far I have never actually gotten a
    real CRC mismatch.  I've always had to go in and munge a few bytes
    on the disk image to get it to fail.

    I'll bet you have a system memory issue.  Either something is
    overheating or one of your ram sticks is heading towards failure.  There
    are a few things you can try before you start ripping the machine apart:

    * Go into the BIOS setup and see if it has options to adjust the dynamic
      ram timing, FSB (front side bus) frequency, and cpu frequency.  If it
      does, slow them ALL down a little and see if the problem goes away.

    * Check the temperature on all major chips on the MB by touching the top
      of the chip.  Also check the temperature of your ram sticks.

    * Check that the hard drive is not overheating.

    * Try replacing the ram.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Bugs mailing list