Still looking for reports of missed directory entries w/ HAMMER
Matthew Dillon
dillon at apollo.backplane.com
Thu Apr 16 09:21:30 PDT 2009
:I can't say for sure, but the probability is high the machine crashed during a
:recopy operation.
:This can take a long time (too many files, 28K directories per run)
:
:> That looks like a case where the directory entry exists but the inode
:> does not. I have seen this occur before in crash recovery cases but
:> I had thought I had fixed it. There's was an edge case where a directory
:> entry can get synced to disk in a different transaction then the inode.
:> If the machine crashes right then you wind up with the above situation.
:
:The buggy directories were created after I upgraded to DragonFly-2.2.
:
:> If the files should have been stable then try rebooting the machine
:> and see if the problem is still present. That will tell me whether
:> its a namecache effect in the kernel or something that got synced
:> to the media.
:
:The media is definitely corrupt: I rebooted the machine and it is still
:impossible to delete the directories.
:All error messages stay the same.
:
:Additionally, the following kernel messages were emitted during the last two days
:(after the reboot):
:
:Warning: BTREE_REMOVE: Defering parent removal2 @ 80000013e5b2d000, skipping
:Warning: BTREE_REMOVE: Defering parent removal2 @ 8000002ba5b63000, skipping
:Warning: BTREE_REMOVE: Defering parent removal2 @ 800000369defc000, skipping
:Warning: BTREE_REMOVE: Defering parent removal2 @ 800000372a107000, skipping
:
:--
:Francois Tigeot
Those warnings can be ignored. It just means HAMMER couldn't immediately
remove an empty B-Tree leaf because someone else had one of the
parent B-Tree nodes locked. In fact, I will remove the message from
the sources.
I think the media issue is probably due to the crash. It isn't actually
corrupt, i.e. the UNDO works properly, but the directory entry wound
up getting created in a different transaction then the inode and the
crash occured inbetween, so we wound up with a directory entry and no
inode post-crash.
I will adjust HAMMER to allow those dead entries to be removed and look
into the flush sequencing again, after I've tracked down the directory
issue.
Dealing with link counts on inodes is actually quite difficult
because there can be multiple directory entries pending in different
transactions. I'm very careful to sync the inode's link count after
discounting directory entries queued for later flush transactions.
However, I think from your report there must be a bug where the initial
creation of an inode is not occuring in the same flusn transaction as
the creation of the related directory entry.
--
On the 'ls' issue.. its definitely a different issue. I got another
report from Peter who saw it happen on Avalon. It's definitely some
sort of transient cache effect or a bug in the run-time that does
NOT effect the media. I'm going to try to track that one down first.
Maybe its a bug in HAMMER's getdirentries() routine.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Bugs
mailing list