Still looking for reports of missed directory entries w/ HAMMER

Matthew Dillon dillon at apollo.backplane.com
Fri Apr 17 14:09:54 PDT 2009


    Ok, here is what I've come up with so far:

    * ls sometimes appears to not list some entries, but then they show up.

	- The case where the effect is temporary, ls appears not to report
	  an error and there's somewhat of a question-mark as to why.

	- The case where the effect is permanent, and ls reports errors for
	  some files, is a different issue related to crash recoery.  I
	  will be making it possible to delete such files, but I want to
	  track down the temporary case first.

    * zsh has a correction feature.  For zsh the temporary issue appears as
      a request to correct the name of a file, but then offer exactly
      the same name as the correction.  Peter Avalos hit this with e.g.
      vi <somefilename> sometimes.

	- Zsh does an access() call, which apparently fails, and then
	  reads the contents of the directory to come up with the proposed
	  correction... the directory appears to contain the correct
	  filename.

    * cpdup.  I have seen this temporary issue with cpdup from NFS to HAMMER.
	
	- cpdup does not report an error if it creates a sub-directory as 
	  part of the cpdup operation, but is then unable to stat or chown
	  it.  This causes cpdup to incorrectly believe that the target
	  directory is in a different filesystem and it silently fails to
	  copy anything into it.

	  The bug is line 840 in /usr/bin/cpdup.c.  I will commit a fix
	  right now so it reports the error.

	- But when I check manually with ls the target directory DOES exist,
	  but cpdup failed to copy anything into it.

    From this information it is my belief that this issue is due to a
    cache effect when a file or directory is created or recently created.
    The effect can cause lookups of the file or directory to fail, even
    though the creation actually succeeded.  When the cache effect goes
    away the entry becomes visible on the media.

    The directory appears to properly contain the entry, but it cannot be
    stat()'d etc.  But when the cache effect in the kernel self-corrects
    (probably due to simply being discarded), the entry can be stat()'d
    again.  This is different from the permanent media issue reported by
    Francois Tigeot.

    So now I am trying to find a case where newly created or modified or
    renamed files somehow get confused in the kernel cache.  That's where
    I am.  No smoking gun yet but a lot of clues.
	
						-Matt






More information about the Bugs mailing list