namecache: locks and races

Matthew Dillon dillon at apollo.backplane.com
Thu Feb 10 11:04:52 PST 2005


:Resent to newsgroup instead of ml.
:
:Thanks for the very informative answer.
:The check on memory addresses had me pretty confused :)
:
:...
:The patch didn't fix the race I trigger. I added a new diagnostic
:message which is printed before we execute "goto again;" in cache_inval.
:The following log shows pretty clearly what's happening.
:
:The dir structure is as I said before /afs/su.se/home/r/n/rnyberg/TEST/....
:
:Feb 10 12:20:21 doozer kernel: invalidnode: cache scrapped (r)
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
:Feb 10 12:20:21 doozer kernel: invalidnode: cache scrapped (home)
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
:Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
:.. and so on, and so forth.
:
:nnpfs prints "cache scrapped(%s)" before calling cache_inval_vp.
:It succeeds in scrapping the r directory, but races on home.
:
:        -Richard

    Ahhh.  This clears things up!   Very nice debugging printfs.   It makes
    it completely obvious.

    What is happening is that cache_resolve() is creating a child node 
    under home at the same time cache_inval() is trying to invalidate it.

    Hmm.  Ok.  I think I might have been a bit overzelous in the retry code. 
    It isn't actually necessary to re-destroy new associations created
    while cache_inval is recursing on the children.  We only need to destroy
    the *old* associations.  This means we can take just a single-pass
    through and get rid of the retry code entirely.

    The only place where it really matters is in cache_rename(), but that
    case can be handled in cache_rename() itself.

    I really appreciate the work you are doing, the new namecache code is
    the cornerstone of a great deal of already completed and up-and-coming
    VFS code and we have to make sure it works properly.

    Please try this patch.


					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>

Index: kern/vfs_cache.c
===================================================================
RCS file: /cvs/src/sys/kern/vfs_cache.c,v
retrieving revision 1.50
diff -u -r1.50 vfs_cache.c
--- kern/vfs_cache.c	2 Feb 2005 21:34:18 -0000	1.50
+++ kern/vfs_cache.c	10 Feb 2005 18:57:52 -0000
@@ -595,15 +595,27 @@
  *			  from the leaves up as the recursion backs out.
  *
  * Note that the topology for any referenced nodes remains intact.
+ *
+ * It is possible for cache_inval() to race a cache_resolve(), meaning that
+ * the namecache entry may not actually be invalidated on return if it was
+ * revalidated while recursing down into its children.  This code guarentees
+ * that the node(s) will go through an invalidation cycle, but does not 
+ * guarentee that they will remain in an invalidated state. 
+ *
+ * Returns non-zero if a revalidation was detected during the invalidation
+ * recursion, zero otherwise.  Note that since only the original ncp is
+ * locked the revalidation ultimately can only indicate that the original ncp
+ * *MIGHT* no have been reresolved.
  */
-void
+int
 cache_inval(struct namecache *ncp, int flags)
 {
 	struct namecache *kid;
 	struct namecache *nextkid;
+	int rcnt = 0;
 
 	KKASSERT(ncp->nc_exlocks);
-again:
+
 	cache_setunresolved(ncp);
 	if (flags & CINV_DESTROY)
 		ncp->nc_flag |= NCF_DESTROYED;
@@ -620,36 +632,51 @@
 			    TAILQ_FIRST(&kid->nc_list)
 			) {
 				cache_lock(kid);
-				cache_inval(kid, flags & ~CINV_DESTROY);
+				rcnt += cache_inval(kid, flags & ~CINV_DESTROY);
 				cache_unlock(kid);
 			}
 			cache_drop(kid);
 			kid = nextkid;
 		}
 		cache_lock(ncp);
-
-		/*
-		 * Someone could have gotten in there while ncp was unlocked,
-		 * retry if so.
-		 */
-		if ((ncp->nc_flag & NCF_UNRESOLVED) == 0)
-			goto again;
 	}
+
+	/*
+	 * Someone could have gotten in there while ncp was unlocked,
+	 * retry if so.
+	 */
+	if ((ncp->nc_flag & NCF_UNRESOLVED) == 0)
+		++rcnt;
+	return (rcnt);
 }
 
 /*
- * Invalidate a vnode's namecache associations.
+ * Invalidate a vnode's namecache associations.  To avoid races against
+ * the resolver we do not invalidate a node which we previously invalidated
+ * but which was then re-resolved while we were in the invalidation loop.
+ *
+ * Returns non-zero if any namecache entries remain after the invalidation
+ * loop completed.
  */
-void
+int
 cache_inval_vp(struct vnode *vp, int flags)
 {
 	struct namecache *ncp;
+	struct namecache *next;
 
-	while ((ncp = TAILQ_FIRST(&vp->v_namecache)) != NULL) {
-		cache_get(ncp);
+	ncp = TAILQ_FIRST(&vp->v_namecache);
+	if (ncp)
+		cache_hold(ncp);
+	while (ncp) {
+		/* loop entered with ncp held */
+		if ((next = TAILQ_NEXT(ncp, nc_entry)) != NULL)
+			cache_hold(next);
+		cache_lock(ncp);
 		cache_inval(ncp, flags);
-		cache_put(ncp);
+		cache_put(ncp);		/* also releases reference */
+		ncp = next;
 	}
+	return(TAILQ_FIRST(&vp->v_namecache) != NULL);
 }
 
 /*
@@ -673,10 +700,19 @@
 cache_rename(struct namecache *fncp, struct namecache *tncp)
 {
 	struct namecache *scan;
+	int didwarn = 0;
 
 	cache_setunresolved(fncp);
 	cache_setunresolved(tncp);
-	cache_inval(tncp, CINV_CHILDREN);
+	while (cache_inval(tncp, CINV_CHILDREN) != 0) {
+		if (didwarn++ % 10 == 0) {
+			printf("Warning: cache_rename: race during "
+				"rename %s->%s\n",
+				fncp->nc_name, tncp->nc_name);
+		}
+		tsleep(tncp, 0, "mvrace", hz / 10);
+		cache_setunresolved(tncp);
+	}
 	while ((scan = TAILQ_FIRST(&fncp->nc_list)) != NULL) {
 		cache_hold(scan);
 		cache_unlink_parent(scan);
Index: kern/vfs_subr.c
===================================================================
RCS file: /cvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.51
diff -u -r1.51 vfs_subr.c
--- kern/vfs_subr.c	2 Feb 2005 21:34:18 -0000	1.51
+++ kern/vfs_subr.c	9 Feb 2005 19:00:28 -0000
@@ -827,7 +827,10 @@
 	/*
 	 * Scrap the vfs cache
 	 */
-	cache_inval_vp(vp, 0);
+	while (cache_inval_vp(vp, 0) != 0) {
+		printf("Warning: vnode %p clean/cache_resolution race detected\n", vp);
+		tsleep(vp, 0, "vclninv", 2);
+	}
 
 	/*
 	 * Check to see if the vnode is in use. If so we have to reference it
Index: sys/namecache.h
===================================================================
RCS file: /cvs/src/sys/sys/namecache.h,v
retrieving revision 1.18
diff -u -r1.18 namecache.h
--- sys/namecache.h	31 Jan 2005 17:17:58 -0000	1.18
+++ sys/namecache.h	10 Feb 2005 18:56:22 -0000
@@ -154,8 +154,8 @@
 void	cache_setunresolved(struct namecache *ncp);
 struct namecache *cache_nlookup(struct namecache *par, struct nlcomponent *nlc);
 struct namecache *cache_allocroot(struct mount *mp, struct vnode *vp);
-void	cache_inval(struct namecache *ncp, int flags);
-void	cache_inval_vp(struct vnode *vp, int flags);
+int	cache_inval(struct namecache *ncp, int flags);
+int	cache_inval_vp(struct vnode *vp, int flags);
 void	vfs_cache_setroot(struct vnode *vp, struct namecache *ncp);
 
 int	cache_resolve(struct namecache *ncp, struct ucred *cred);





More information about the Kernel mailing list