namecache: locks and races
Matthew Dillon
dillon at apollo.backplane.com
Wed Feb 9 11:09:05 PST 2005
:Hello Matt,
:
:I'm making very good progress on my AFS port, but I still I have some
:questions about the namecache.
:
:Is it always risk for deadlock when trying to lock more than one ncp?
:or are there some situations that are safe? IIRC in the traditional
:BSD namecache it's safe to lock a child when you have locked the parent.
:
:I don't quite understand the following snippet from kern_rename.
:Could you explain the different cases and why they're safe?
:
:<code>
: /*
: * relock the source ncp
: */
: if (cache_lock_nonblock(fromnd->nl_ncp) == 0) {
: cache_resolve(fromnd->nl_ncp, fromnd->nl_cred);
: } else if (fromnd->nl_ncp > tond->nl_ncp) {
: cache_lock(fromnd->nl_ncp);
: cache_resolve(fromnd->nl_ncp, fromnd->nl_cred);
: } else {
: cache_unlock(tond->nl_ncp);
: cache_lock(fromnd->nl_ncp);
: cache_resolve(fromnd->nl_ncp, fromnd->nl_cred);
: cache_lock(tond->nl_ncp);
: cache_resolve(tond->nl_ncp, tond->nl_cred);
: }
: fromnd->nl_flags |= NLC_NCPISLOCKED;
:</code>
:
:Also, I've noticed that it's quite easy to provoke a livelock situation
:between cache_resolve and cache_inval. When I run the arla test suite
:there are lot of activity going on in directories like these:
:/afs/su.se/home/r/n/rnyberg/TEST/$HOST-$TESTNAME-$DATE
:
:What happens quite frequently is that nnpfs receives a callback on
:/afs/su.se/home and proceeds to do cache_inval_vp(vp, CINV_CHILDREN)
:on it. If we're unlucky cache_resolve and cache_inval competes
:to (un)resolve and I get tons and tons of "had ro recurse on home" in
:syslog. Sometimes the live lock breaks by me just starting a new process,
:sometimes I have to kill the process that's doing cache_resolve.
:
:The callback nnpfs gets is a message just telling it that the contents
:of a directory may have changed. Is the above call to cache_inval_vp
:the right thing to do?
:
:Thanks in advance,
: -Richard
Ah, you found one of my bad hacks :-)
The answer is that you've found probably the last major item in the
namecache API that needs to be fixed... that is the race between
name resolution and name invalidation. You will notice that it
isn't locking up the entire system.
The general rule for the namecache code is that a locked child can
lock its parent. This is the reverse of the FreeBSD vnode locking
rule.
It should make sense if you think about it a bit. The locked child
may need information from the parent to complete whatever it needs
to complete.
The rename code is a different situation. The 'from' and 'to' will
not one be the parent of the other or vise versa, so the rename
code theoretically only has to deal with deadlocks against other
renames. In that case we can choose the order and I chose to order
the locks simply based on the relative pointer addresses of the from
and to.
--
Calling cache_inval_vp()... What you are doing is legal, but nasty.
Invalidating an entire directory hierarchy (which is what CINV_CHILDREN
will do) is not something you want to do if you can at all help it.
If there's no choice, though, there's no choice. It is meant to work.
It is not supposed to create a livelock however, but looking at the code
I can well see how this could occur. I think what we need to do here
is the same thing that the main cache_inval() loop does, which is
to keep its place in the loop rather then always take from the head
of the list.
I have enclosed a patch that *may* fix the problem, please try it out.
It is NOT well tested. If it does not fix the problem then try adding
a delay in cache_resolve()'s retry case, e.g. tsleep(ncp, 0, "livelk", 1);
(but I'm hoping that will not be necessary).
--
As for the 'had to recurse..' warnings... those are not errors but
printf's I added to try to catch the situation. Obviously it caught
it big time with your tests! We can put them under a sysctl so they
don't clutter the console. There is possibly a second livelock issue
with the parent recursion by my feeling is that since it is resolving
the parents to-down any livelock there ought to resolve itself.
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
Index: kern/vfs_cache.c
===================================================================
RCS file: /cvs/src/sys/kern/vfs_cache.c,v
retrieving revision 1.50
diff -u -r1.50 vfs_cache.c
--- kern/vfs_cache.c 2 Feb 2005 21:34:18 -0000 1.50
+++ kern/vfs_cache.c 9 Feb 2005 19:02:38 -0000
@@ -638,18 +638,32 @@
}
/*
- * Invalidate a vnode's namecache associations.
+ * Invalidate a vnode's namecache associations. To avoid races against
+ * the resolver we do not invalidate a node which we previously invalidated
+ * but which was then re-resolved while we were in the invalidation loop.
+ *
+ * Returns non-zero if any namecache entries remain after the invalidation
+ * loop completed.
*/
-void
+int
cache_inval_vp(struct vnode *vp, int flags)
{
struct namecache *ncp;
+ struct namecache *next;
- while ((ncp = TAILQ_FIRST(&vp->v_namecache)) != NULL) {
- cache_get(ncp);
+ ncp = TAILQ_FIRST(&vp->v_namecache);
+ if (ncp)
+ cache_hold(ncp);
+ while (ncp) {
+ /* loop entered with ncp held */
+ if ((next = TAILQ_NEXT(ncp, nc_entry)) != NULL)
+ cache_hold(next);
+ cache_lock(ncp);
cache_inval(ncp, flags);
- cache_put(ncp);
+ cache_put(ncp); /* also releases reference */
+ ncp = next;
}
+ return(TAILQ_FIRST(&vp->v_namecache) != NULL);
}
/*
Index: kern/vfs_subr.c
===================================================================
RCS file: /cvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.51
diff -u -r1.51 vfs_subr.c
--- kern/vfs_subr.c 2 Feb 2005 21:34:18 -0000 1.51
+++ kern/vfs_subr.c 9 Feb 2005 19:00:28 -0000
@@ -827,7 +827,10 @@
/*
* Scrap the vfs cache
*/
- cache_inval_vp(vp, 0);
+ while (cache_inval_vp(vp, 0) != 0) {
+ printf("Warning: vnode %p clean/cache_resolution race detected\n", vp);
+ tsleep(vp, 0, "vclninv", 2);
+ }
/*
* Check to see if the vnode is in use. If so we have to reference it
Index: sys/namecache.h
===================================================================
RCS file: /cvs/src/sys/sys/namecache.h,v
retrieving revision 1.18
diff -u -r1.18 namecache.h
--- sys/namecache.h 31 Jan 2005 17:17:58 -0000 1.18
+++ sys/namecache.h 9 Feb 2005 19:00:42 -0000
@@ -155,7 +155,7 @@
struct namecache *cache_nlookup(struct namecache *par, struct nlcomponent *nlc);
struct namecache *cache_allocroot(struct mount *mp, struct vnode *vp);
void cache_inval(struct namecache *ncp, int flags);
-void cache_inval_vp(struct vnode *vp, int flags);
+int cache_inval_vp(struct vnode *vp, int flags);
void vfs_cache_setroot(struct vnode *vp, struct namecache *ncp);
int cache_resolve(struct namecache *ncp, struct ucred *cred);
More information about the Kernel
mailing list