namecache: locks and races

Matthew Dillon dillon at apollo.backplane.com
Wed Feb 9 11:09:05 PST 2005


:Hello Matt,
:
:I'm making very good progress on my AFS port, but I still I have some
:questions about the namecache.
:
:Is it always risk for deadlock when trying to lock more than one ncp?
:or are there some situations that are safe? IIRC in the traditional
:BSD namecache it's safe to lock a child when you have locked the parent.
:
:I don't quite understand the following snippet from kern_rename.
:Could you explain the different cases and why they're safe?
:
:<code>
:	/*
:	 * relock the source ncp
:	 */
:	if (cache_lock_nonblock(fromnd->nl_ncp) == 0) {
:		cache_resolve(fromnd->nl_ncp, fromnd->nl_cred);
:	} else if (fromnd->nl_ncp > tond->nl_ncp) {
:		cache_lock(fromnd->nl_ncp);
:		cache_resolve(fromnd->nl_ncp, fromnd->nl_cred);
:	} else {
:		cache_unlock(tond->nl_ncp);
:		cache_lock(fromnd->nl_ncp);
:		cache_resolve(fromnd->nl_ncp, fromnd->nl_cred);
:		cache_lock(tond->nl_ncp);
:		cache_resolve(tond->nl_ncp, tond->nl_cred);
:	}
:	fromnd->nl_flags |= NLC_NCPISLOCKED;
:</code>
:
:Also, I've noticed that it's quite easy to provoke a livelock situation
:between cache_resolve and cache_inval. When I run the arla test suite
:there are lot of activity going on in directories like these:
:/afs/su.se/home/r/n/rnyberg/TEST/$HOST-$TESTNAME-$DATE
:
:What happens quite frequently is that nnpfs receives a callback on
:/afs/su.se/home and proceeds to do cache_inval_vp(vp, CINV_CHILDREN)
:on it. If we're unlucky cache_resolve and cache_inval competes
:to (un)resolve and I get tons and tons of "had ro recurse on home" in
:syslog. Sometimes the live lock breaks by me just starting a new process,
:sometimes I have to kill the process that's doing cache_resolve.
:
:The callback nnpfs gets is a message just telling it that the contents
:of a directory may have changed. Is the above call to cache_inval_vp
:the right thing to do?
:
:Thanks in advance,
:        -Richard

    Ah, you found one of my bad hacks :-)

    The answer is that you've found probably the last major item in the
    namecache API that needs to be fixed... that is the race between
    name resolution and name invalidation.  You will notice that it
    isn't locking up the entire system.

    The general rule for the namecache code is that a locked child can
    lock its parent.  This is the reverse of the FreeBSD vnode locking 
    rule.

    It should make sense if you think about it a bit.  The locked child 
    may need information from the parent to complete whatever it needs
    to complete.

    The rename code is a different situation.  The 'from' and 'to' will
    not one be the parent of the other or vise versa, so the rename
    code theoretically only has to deal with deadlocks against other
    renames.  In that case we can choose the order and I chose to order
    the locks simply based on the relative pointer addresses of the from
    and to.

    --

    Calling cache_inval_vp()... What you are doing is legal, but nasty.
    Invalidating an entire directory hierarchy (which is what CINV_CHILDREN
    will do) is not something you want to do if you can at all help it.
    If there's no choice, though, there's no choice.  It is meant to work.

    It is not supposed to create a livelock however, but looking at the code
    I can well see how this could occur.  I think what we need to do here
    is the same thing that the main cache_inval() loop does, which is 
    to keep its place in the loop rather then always take from the head
    of the list.

    I have enclosed a patch that *may* fix the problem, please try it out.
    It is NOT well tested.  If it does not fix the problem then try adding
    a delay in cache_resolve()'s retry case, e.g. tsleep(ncp, 0, "livelk", 1);
    (but I'm hoping that will not be necessary).

    --

    As for the 'had to recurse..' warnings... those are not errors but
    printf's I added to try to catch the situation.  Obviously it caught
    it big time with your tests!  We can put them under a sysctl so they
    don't clutter the console.  There is possibly a second livelock issue
    with the parent recursion by my feeling is that since it is resolving
    the parents to-down any livelock there ought to resolve itself.


					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>

Index: kern/vfs_cache.c
===================================================================
RCS file: /cvs/src/sys/kern/vfs_cache.c,v
retrieving revision 1.50
diff -u -r1.50 vfs_cache.c
--- kern/vfs_cache.c	2 Feb 2005 21:34:18 -0000	1.50
+++ kern/vfs_cache.c	9 Feb 2005 19:02:38 -0000
@@ -638,18 +638,32 @@
 }
 
 /*
- * Invalidate a vnode's namecache associations.
+ * Invalidate a vnode's namecache associations.  To avoid races against
+ * the resolver we do not invalidate a node which we previously invalidated
+ * but which was then re-resolved while we were in the invalidation loop.
+ *
+ * Returns non-zero if any namecache entries remain after the invalidation
+ * loop completed.
  */
-void
+int
 cache_inval_vp(struct vnode *vp, int flags)
 {
 	struct namecache *ncp;
+	struct namecache *next;
 
-	while ((ncp = TAILQ_FIRST(&vp->v_namecache)) != NULL) {
-		cache_get(ncp);
+	ncp = TAILQ_FIRST(&vp->v_namecache);
+	if (ncp)
+		cache_hold(ncp);
+	while (ncp) {
+		/* loop entered with ncp held */
+		if ((next = TAILQ_NEXT(ncp, nc_entry)) != NULL)
+			cache_hold(next);
+		cache_lock(ncp);
 		cache_inval(ncp, flags);
-		cache_put(ncp);
+		cache_put(ncp);		/* also releases reference */
+		ncp = next;
 	}
+	return(TAILQ_FIRST(&vp->v_namecache) != NULL);
 }
 
 /*
Index: kern/vfs_subr.c
===================================================================
RCS file: /cvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.51
diff -u -r1.51 vfs_subr.c
--- kern/vfs_subr.c	2 Feb 2005 21:34:18 -0000	1.51
+++ kern/vfs_subr.c	9 Feb 2005 19:00:28 -0000
@@ -827,7 +827,10 @@
 	/*
 	 * Scrap the vfs cache
 	 */
-	cache_inval_vp(vp, 0);
+	while (cache_inval_vp(vp, 0) != 0) {
+		printf("Warning: vnode %p clean/cache_resolution race detected\n", vp);
+		tsleep(vp, 0, "vclninv", 2);
+	}
 
 	/*
 	 * Check to see if the vnode is in use. If so we have to reference it
Index: sys/namecache.h
===================================================================
RCS file: /cvs/src/sys/sys/namecache.h,v
retrieving revision 1.18
diff -u -r1.18 namecache.h
--- sys/namecache.h	31 Jan 2005 17:17:58 -0000	1.18
+++ sys/namecache.h	9 Feb 2005 19:00:42 -0000
@@ -155,7 +155,7 @@
 struct namecache *cache_nlookup(struct namecache *par, struct nlcomponent *nlc);
 struct namecache *cache_allocroot(struct mount *mp, struct vnode *vp);
 void	cache_inval(struct namecache *ncp, int flags);
-void	cache_inval_vp(struct vnode *vp, int flags);
+int	cache_inval_vp(struct vnode *vp, int flags);
 void	vfs_cache_setroot(struct vnode *vp, struct namecache *ncp);
 
 int	cache_resolve(struct namecache *ncp, struct ucred *cred);





More information about the Kernel mailing list