panic: assertion: leaf->base.obj_id == ip->obj_id in hammer_ip_delete_range

Mon Nov 2 21:06:32 PST 2009

:On Thu, Oct 22, 2009 at 06:19:01PM -0700, Matthew Dillon wrote:
:> 	fetch http://apollo.backplane.com/DFlyMisc/hammer06.patch
:
:It seems like btree_remove() sets cursor->parent to NULL in its
:recursion path starting at hammer_btree.c:2226 but somehow returns 0
:which ends up hitting the first KKASSERT() in hammer_cursor_removed_node().

    In your vkernel panic.  By the way, the panic message will be correct
    but the symbols in the backtrace are clearly messed up.  For some
    reason the vkernel reports the symbols incorrectly, I don't know
    why.

    But given that panic message:

	panic: assertion: parent != NULL in hammer_cursor_removed_node

    The only possible path is via btree_remove().  I'm a bit at a loss
    here.  I don't see how that panic could still occur with the recent
    patches.  I went through your emails again and found this comment:

"By the way, I caught a different panic on vkernel.  I think the last
I ran `hammer cleanup' on /usr/obj was before applying hammer05.patch
or hammer06.patch to vkernel."

    Was that vkernel backtrace a pre-patch panic?

				--------

    Going back to crash dumps .10 and .11 which paniced at:

	panic: assertion: s <= 0 in hammer_btree_iterate

    I think I see a possible issue.  If hammer_btree_remove() fails with
    EDEADLK hammer_btree_delete() ignores the error on line 897.  This
    is correct, we WANT to ignore the error because it is ok for
    hammer_btree_remove()'s recursion to fail... it just means we could
    not recursively delete the internal nodes to get rid of the empty
    leaf.  The leaf is simply left empty.

    However I think this opens up an error path where the cursor can wind
    up in a bad state when EDEADLK is returned and lead to the assertion.
    It's just a guess at the moment.  The normal case is clearly not
    causing any problems, otherwise you'd get the panic instantly.  It
    takes a cpu/disk load and time to cause the panic to occur so it has
    to be in the EDEADLK handling somewhere.

    Another possibility is via hammer_btree_do_propagation(), which is also
    called indirectly inside that loop.  This code pushes the cursor,
    does some work, then pops the cursor.  But pushing a cursor unlocks it,
    so some other third party operation can wind up adjusting it.  It
    is possible that some other deletion caused the node under the cursor
    to be removed, causing the cursor to be adjusted to the parent node.
    If the node that was removed was a node under the root node, then the
    new cursor->node will become the root node and the cursor->parent will
    become NULL.  This should be ok (we are talking about the s <= 0 panic
    here, not the cursor->parent != NULL panic).

    But I'm thinking one of the above two conditions is causing the cursor
    to get whacked out of shape badly enough to hit the s <= 0 assertion
    in the iteration.

    For the assertion to fail the cursor would have to be indexed to
    BEFORE the beginning of the range.  This can only occur if the
    cursor gets adjusted while unlocked or is munged beyond hope in
    the hammer_btree_do_propagation() or hammer_btree_remove() sequence.

    I haven't found the smoking gun yet.  This code is terribly complex.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>