namecache: locks and races

Richard Nyberg rnyberg at it.su.se
Wed Feb 9 07:11:45 PST 2005


Hello Matt,

I'm making very good progress on my AFS port, but I still I have some
questions about the namecache.

Is it always risk for deadlock when trying to lock more than one ncp?
or are there some situations that are safe? IIRC in the traditional
BSD namecache it's safe to lock a child when you have locked the parent.

I don't quite understand the following snippet from kern_rename.
Could you explain the different cases and why they're safe?

<code>
	/*
	 * relock the source ncp
	 */
	if (cache_lock_nonblock(fromnd->nl_ncp) == 0) {
		cache_resolve(fromnd->nl_ncp, fromnd->nl_cred);
	} else if (fromnd->nl_ncp > tond->nl_ncp) {
		cache_lock(fromnd->nl_ncp);
		cache_resolve(fromnd->nl_ncp, fromnd->nl_cred);
	} else {
		cache_unlock(tond->nl_ncp);
		cache_lock(fromnd->nl_ncp);
		cache_resolve(fromnd->nl_ncp, fromnd->nl_cred);
		cache_lock(tond->nl_ncp);
		cache_resolve(tond->nl_ncp, tond->nl_cred);
	}
	fromnd->nl_flags |= NLC_NCPISLOCKED;
</code>

Also, I've noticed that it's quite easy to provoke a livelock situation
between cache_resolve and cache_inval. When I run the arla test suite
there are lot of activity going on in directories like these:
/afs/su.se/home/r/n/rnyberg/TEST/$HOST-$TESTNAME-$DATE

What happens quite frequently is that nnpfs receives a callback on
/afs/su.se/home and proceeds to do cache_inval_vp(vp, CINV_CHILDREN)
on it. If we're unlucky cache_resolve and cache_inval competes
to (un)resolve and I get tons and tons of "had ro recurse on home" in
syslog. Sometimes the live lock breaks by me just starting a new process,
sometimes I have to kill the process that's doing cache_resolve.

The callback nnpfs gets is a message just telling it that the contents
of a directory may have changed. Is the above call to cache_inval_vp
the right thing to do?

Thanks in advance,
        -Richard






More information about the Kernel mailing list