Still looking for reports of missed directory entries w/ HAMMER
Matthew Dillon
dillon at apollo.backplane.com
Fri Apr 17 14:09:54 PDT 2009
Ok, here is what I've come up with so far:
* ls sometimes appears to not list some entries, but then they show up.
- The case where the effect is temporary, ls appears not to report
an error and there's somewhat of a question-mark as to why.
- The case where the effect is permanent, and ls reports errors for
some files, is a different issue related to crash recoery. I
will be making it possible to delete such files, but I want to
track down the temporary case first.
* zsh has a correction feature. For zsh the temporary issue appears as
a request to correct the name of a file, but then offer exactly
the same name as the correction. Peter Avalos hit this with e.g.
vi <somefilename> sometimes.
- Zsh does an access() call, which apparently fails, and then
reads the contents of the directory to come up with the proposed
correction... the directory appears to contain the correct
filename.
* cpdup. I have seen this temporary issue with cpdup from NFS to HAMMER.
- cpdup does not report an error if it creates a sub-directory as
part of the cpdup operation, but is then unable to stat or chown
it. This causes cpdup to incorrectly believe that the target
directory is in a different filesystem and it silently fails to
copy anything into it.
The bug is line 840 in /usr/bin/cpdup.c. I will commit a fix
right now so it reports the error.
- But when I check manually with ls the target directory DOES exist,
but cpdup failed to copy anything into it.
From this information it is my belief that this issue is due to a
cache effect when a file or directory is created or recently created.
The effect can cause lookups of the file or directory to fail, even
though the creation actually succeeded. When the cache effect goes
away the entry becomes visible on the media.
The directory appears to properly contain the entry, but it cannot be
stat()'d etc. But when the cache effect in the kernel self-corrects
(probably due to simply being discarded), the entry can be stat()'d
again. This is different from the permanent media issue reported by
Francois Tigeot.
So now I am trying to find a case where newly created or modified or
renamed files somehow get confused in the kernel cache. That's where
I am. No smoking gun yet but a lot of clues.
-Matt
More information about the Bugs
mailing list