Still looking for reports of missed directory entries w/ HAMMER

Matthew Dillon dillon at apollo.backplane.com
Thu Apr 16 09:21:30 PDT 2009


:I can't say for sure, but the probability is high the machine crashed during a
:recopy operation.
:This can take a long time (too many files, 28K directories per run)
:
:>     That looks like a case where the directory entry exists but the inode
:>     does not.  I have seen this occur before in crash recovery cases but
:>     I had thought I had fixed it.  There's was an edge case where a directory
:>     entry can get synced to disk in a different transaction then the inode.
:>     If the machine crashes right then you wind up with the above situation.
:
:The buggy directories were created after I upgraded to DragonFly-2.2.
:
:>     If the files should have been stable then try rebooting the machine
:>     and see if the problem is still present.  That will tell me whether
:>     its a namecache effect in the kernel or something that got synced
:>     to the media.
:
:The media is definitely corrupt: I rebooted the machine and it is still
:impossible to delete the directories.
:All error messages stay the same.
:
:Additionally, the following kernel messages were emitted during the last two days
:(after the reboot):
:
:Warning: BTREE_REMOVE: Defering parent removal2 @ 80000013e5b2d000, skipping
:Warning: BTREE_REMOVE: Defering parent removal2 @ 8000002ba5b63000, skipping
:Warning: BTREE_REMOVE: Defering parent removal2 @ 800000369defc000, skipping
:Warning: BTREE_REMOVE: Defering parent removal2 @ 800000372a107000, skipping
: 
:-- 
:Francois Tigeot

    Those warnings can be ignored.  It just means HAMMER couldn't immediately
    remove an empty B-Tree leaf because someone else had one of the 
    parent B-Tree nodes locked.  In fact, I will remove the message from
    the sources.

    I think the media issue is probably due to the crash.  It isn't actually
    corrupt, i.e. the UNDO works properly, but the directory entry wound
    up getting created in a different transaction then the inode and the
    crash occured inbetween, so we wound up with a directory entry and no
    inode post-crash.

    I will adjust HAMMER to allow those dead entries to be removed and look
    into the flush sequencing again, after I've tracked down the directory
    issue.

    Dealing with link counts on inodes is actually quite difficult
    because there can be multiple directory entries pending in different
    transactions.  I'm very careful to sync the inode's link count after
    discounting directory entries queued for later flush transactions.
    However, I think from your report there must be a bug where the initial
    creation of an inode is not occuring in the same flusn transaction as
    the creation of the related directory entry.

    --

    On the 'ls' issue.. its definitely a different issue.  I got another
    report from Peter who saw it happen on Avalon.  It's definitely some
    sort of transient cache effect or a bug in the run-time that does
    NOT effect the media.  I'm going to try to track that one down first.
    Maybe its a bug in HAMMER's getdirentries() routine.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Bugs mailing list