panic: assertion: layer2->zone == zone in hammer_blockmap_free

Sat Aug 2 20:27:17 PDT 2008

:Matt, you sent me two other messages privately, but I think this message
:covers what you asked me in them.  apollo doesn't like my IP address, so
:I need to configure my mail to go through ISP to do so (and I haven't, yet).

    Oh, its the .ppp. in the reverse dns.  I need to change over to spamcop
    or something.

:$ ls -l /HAMMER
:total 0
:lrwxr-xr-x  1 root    wheel   26 Jul 19 15:09 obj -> @@0xffffffffffffffff:00001
:lrwxr-xr-x  1 root    wheel   26 Jul 19 16:24 slave -> @@0xffffffffffffffff:00002
:
:/HAMMER is the only HAMMER filesystem on this machine and is mounted without
:nohistory flags.
:
:...
:It experienced two types of major crashes until now: the first one was
:triggered by an attempt of cross-device link in the middle of July.
:The other was triggered by network code (reused socket on connect).
:According to /var/log/messages, the recovery was run only once, though.
:
:  Jul 19 11:34:52 firebolt kernel: HAMMER(HAMMER) Start Recovery 30000000002c7350 - 30000000002c93f0 (8352 bytes of UNDO)(RW)
:  Jul 19 11:34:53 firebolt kernel: HAMMER(HAMMER) End Recovery

    Ok, I'm not so worried about the net crash.  The cross-device link
    crashes (before we fixed it) are interesting... those could be important.

    Did you newfs_hammer the filesystem after the cross-device link crashes
    or is it possible that some cruft from those crashes leaked through to
    current-day?  The timestamp in that filesystem's FSID reads July 18th,
    which was right around when you reported that issue.

:I use mirror-copy to sync the slave.  ${.OBJDIR} for buildworld usually
:grows upto 2Gbytes, and ${WRKDIR}s for pkgsrc can reach around 1Gbytes
:if I build a meta-package.  Usually mirror-copy after buildwold or building
:packages, remove the directories in master, then mirror-copy again to
:see if removing files or directories are properly propagated to slave.

    Yah, that's pretty much what I've been doing for testing too.
    I usually also throw in a reblock run in a sleep loop, and an
    occassional prune-everything in its own sleep loop.  Running everything
    in parallel is a pretty good test.

:I remember interrupting reblock on /HAMMER/obj, but I haven't done
:mirror-copy to slave after that, so I don't think it's something to do
:with it.

    Ok. Interrupting reblocking should be fine, it isn't a real interrupt,
    the kernel code polls for the signal at a safe point.

:>     * Are the ~500K inodes mostly associated with the slave or unrelated?
:
:They are mostly assosiated to /HAMMER/source and /HAMMER/obj.
:
:Cheers.

    I'm crossing my fingers and hoping that the issue was related
    to the cross-device link crashes.  If your filesystem still has some
    cruft from those crashes then I will undo the test locally so I can
    reproduce the cross-link crashes and see if I can corrupt the
    filesystem that way.

    If you can, please wipe that filesystem and continue testing fresh,
    and see if you can reproduce that panic (or any bug).

    --

    While trying to reproduce your panic today I found another, unrelated
    bug which I will commit a fix for tomorrow.  There's a small window
    of opportunity where reblocking live data can interfere with programs
    accessing that live data.  It only effects the data though, not the
    meta-data, so it can't be related to the panic you got.

    I am also planning on writing a 'hammer fsck' feature to clean up
    corrupted freemaps & (maybe) B-Trees... kinda a last-resort directive.
    It will probably take most of next week to do.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>