Still looking for reports of missed directory entries w/ HAMMER

Francois Tigeot ftigeot at wolfpond.org
Sun Apr 19 00:08:12 PDT 2009


On Thu, Apr 16, 2009 at 09:18:24AM -0700, Matthew Dillon wrote:
> :I can't say for sure, but the probability is high the machine crashed during a
> :recopy operation.
> :This can take a long time (too many files, 28K directories per run)
> :
> :>     That looks like a case where the directory entry exists but the inode
> :>     does not.  I have seen this occur before in crash recovery cases but
> :>     I had thought I had fixed it.  There's was an edge case where a directory
> :>     entry can get synced to disk in a different transaction then the inode.
> :>     If the machine crashes right then you wind up with the above situation.
> :
> :The media is definitely corrupt: I rebooted the machine and it is still
> :impossible to delete the directories.
> :All error messages stay the same.
> 
>     I think the media issue is probably due to the crash.  It isn't actually
>     corrupt, i.e. the UNDO works properly, but the directory entry wound
>     up getting created in a different transaction then the inode and the
>     crash occured inbetween, so we wound up with a directory entry and no
>     inode post-crash.

I have more information: a new corrupt directory has appeared this morning and
the machine had *not* crashed (uptime: 5 days).

For some reason, rsnapshot seems to be really good at triggering this sort of
bug.

-- 
Francois Tigeot





More information about the Bugs mailing list