namecache: locks and races
Matthew Dillon
dillon at apollo.backplane.com
Thu Feb 10 11:04:52 PST 2005
:Resent to newsgroup instead of ml.
:
:Thanks for the very informative answer.
:The check on memory addresses had me pretty confused :)
:
:...
:The patch didn't fix the race I trigger. I added a new diagnostic
:message which is printed before we execute "goto again;" in cache_inval.
:The following log shows pretty clearly what's happening.
:
:The dir structure is as I said before /afs/su.se/home/r/n/rnyberg/TEST/....
:
:Feb 10 12:20:21 doozer kernel: invalidnode: cache scrapped (r)
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
:Feb 10 12:20:21 doozer kernel: invalidnode: cache scrapped (home)
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
:.. and so on, and so forth.
:
:nnpfs prints "cache scrapped(%s)" before calling cache_inval_vp.
:It succeeds in scrapping the r directory, but races on home.
:
: -Richard
Ahhh. This clears things up! Very nice debugging printfs. It makes
it completely obvious.
What is happening is that cache_resolve() is creating a child node
under home at the same time cache_inval() is trying to invalidate it.
Hmm. Ok. I think I might have been a bit overzelous in the retry code.
It isn't actually necessary to re-destroy new associations created
while cache_inval is recursing on the children. We only need to destroy
the *old* associations. This means we can take just a single-pass
through and get rid of the retry code entirely.
The only place where it really matters is in cache_rename(), but that
case can be handled in cache_rename() itself.
I really appreciate the work you are doing, the new namecache code is
the cornerstone of a great deal of already completed and up-and-coming
VFS code and we have to make sure it works properly.
Please try this patch.
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
Index: kern/vfs_cache.c
===================================================================
RCS file: /cvs/src/sys/kern/vfs_cache.c,v
retrieving revision 1.50
diff -u -r1.50 vfs_cache.c
--- kern/vfs_cache.c 2 Feb 2005 21:34:18 -0000 1.50
+++ kern/vfs_cache.c 10 Feb 2005 18:57:52 -0000
@@ -595,15 +595,27 @@
* from the leaves up as the recursion backs out.
*
* Note that the topology for any referenced nodes remains intact.
+ *
+ * It is possible for cache_inval() to race a cache_resolve(), meaning that
+ * the namecache entry may not actually be invalidated on return if it was
+ * revalidated while recursing down into its children. This code guarentees
+ * that the node(s) will go through an invalidation cycle, but does not
+ * guarentee that they will remain in an invalidated state.
+ *
+ * Returns non-zero if a revalidation was detected during the invalidation
+ * recursion, zero otherwise. Note that since only the original ncp is
+ * locked the revalidation ultimately can only indicate that the original ncp
+ * *MIGHT* no have been reresolved.
*/
-void
+int
cache_inval(struct namecache *ncp, int flags)
{
struct namecache *kid;
struct namecache *nextkid;
+ int rcnt = 0;
KKASSERT(ncp->nc_exlocks);
-again:
+
cache_setunresolved(ncp);
if (flags & CINV_DESTROY)
ncp->nc_flag |= NCF_DESTROYED;
@@ -620,36 +632,51 @@
TAILQ_FIRST(&kid->nc_list)
) {
cache_lock(kid);
- cache_inval(kid, flags & ~CINV_DESTROY);
+ rcnt += cache_inval(kid, flags & ~CINV_DESTROY);
cache_unlock(kid);
}
cache_drop(kid);
kid = nextkid;
}
cache_lock(ncp);
-
- /*
- * Someone could have gotten in there while ncp was unlocked,
- * retry if so.
- */
- if ((ncp->nc_flag & NCF_UNRESOLVED) == 0)
- goto again;
}
+
+ /*
+ * Someone could have gotten in there while ncp was unlocked,
+ * retry if so.
+ */
+ if ((ncp->nc_flag & NCF_UNRESOLVED) == 0)
+ ++rcnt;
+ return (rcnt);
}
/*
- * Invalidate a vnode's namecache associations.
+ * Invalidate a vnode's namecache associations. To avoid races against
+ * the resolver we do not invalidate a node which we previously invalidated
+ * but which was then re-resolved while we were in the invalidation loop.
+ *
+ * Returns non-zero if any namecache entries remain after the invalidation
+ * loop completed.
*/
-void
+int
cache_inval_vp(struct vnode *vp, int flags)
{
struct namecache *ncp;
+ struct namecache *next;
- while ((ncp = TAILQ_FIRST(&vp->v_namecache)) != NULL) {
- cache_get(ncp);
+ ncp = TAILQ_FIRST(&vp->v_namecache);
+ if (ncp)
+ cache_hold(ncp);
+ while (ncp) {
+ /* loop entered with ncp held */
+ if ((next = TAILQ_NEXT(ncp, nc_entry)) != NULL)
+ cache_hold(next);
+ cache_lock(ncp);
cache_inval(ncp, flags);
- cache_put(ncp);
+ cache_put(ncp); /* also releases reference */
+ ncp = next;
}
+ return(TAILQ_FIRST(&vp->v_namecache) != NULL);
}
/*
@@ -673,10 +700,19 @@
cache_rename(struct namecache *fncp, struct namecache *tncp)
{
struct namecache *scan;
+ int didwarn = 0;
cache_setunresolved(fncp);
cache_setunresolved(tncp);
- cache_inval(tncp, CINV_CHILDREN);
+ while (cache_inval(tncp, CINV_CHILDREN) != 0) {
+ if (didwarn++ % 10 == 0) {
+ printf("Warning: cache_rename: race during "
+ "rename %s->%s\n",
+ fncp->nc_name, tncp->nc_name);
+ }
+ tsleep(tncp, 0, "mvrace", hz / 10);
+ cache_setunresolved(tncp);
+ }
while ((scan = TAILQ_FIRST(&fncp->nc_list)) != NULL) {
cache_hold(scan);
cache_unlink_parent(scan);
Index: kern/vfs_subr.c
===================================================================
RCS file: /cvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.51
diff -u -r1.51 vfs_subr.c
--- kern/vfs_subr.c 2 Feb 2005 21:34:18 -0000 1.51
+++ kern/vfs_subr.c 9 Feb 2005 19:00:28 -0000
@@ -827,7 +827,10 @@
/*
* Scrap the vfs cache
*/
- cache_inval_vp(vp, 0);
+ while (cache_inval_vp(vp, 0) != 0) {
+ printf("Warning: vnode %p clean/cache_resolution race detected\n", vp);
+ tsleep(vp, 0, "vclninv", 2);
+ }
/*
* Check to see if the vnode is in use. If so we have to reference it
Index: sys/namecache.h
===================================================================
RCS file: /cvs/src/sys/sys/namecache.h,v
retrieving revision 1.18
diff -u -r1.18 namecache.h
--- sys/namecache.h 31 Jan 2005 17:17:58 -0000 1.18
+++ sys/namecache.h 10 Feb 2005 18:56:22 -0000
@@ -154,8 +154,8 @@
void cache_setunresolved(struct namecache *ncp);
struct namecache *cache_nlookup(struct namecache *par, struct nlcomponent *nlc);
struct namecache *cache_allocroot(struct mount *mp, struct vnode *vp);
-void cache_inval(struct namecache *ncp, int flags);
-void cache_inval_vp(struct vnode *vp, int flags);
+int cache_inval(struct namecache *ncp, int flags);
+int cache_inval_vp(struct vnode *vp, int flags);
void vfs_cache_setroot(struct vnode *vp, struct namecache *ncp);
int cache_resolve(struct namecache *ncp, struct ucred *cred);
More information about the Kernel
mailing list