namecache: locks and races
Richard Nyberg
rnyberg at it.su.se
Thu Feb 10 05:02:09 PST 2005
Resent to newsgroup instead of ml.
Thanks for the very informative answer.
The check on memory addresses had me pretty confused :)
At Wed, 9 Feb 2005 11:06:04 -0800 (PST),
Matthew Dillon wrote:
<snip informative answer>
> --
>
> Calling cache_inval_vp()... What you are doing is legal, but nasty.
> Invalidating an entire directory hierarchy (which is what CINV_CHILDREN
> will do) is not something you want to do if you can at all help it.
> If there's no choice, though, there's no choice. It is meant to work.
>
> It is not supposed to create a livelock however, but looking at the code
> I can well see how this could occur. I think what we need to do here
> is the same thing that the main cache_inval() loop does, which is
> to keep its place in the loop rather then always take from the head
> of the list.
>
> I have enclosed a patch that *may* fix the problem, please try it out.
> It is NOT well tested. If it does not fix the problem then try adding
> a delay in cache_resolve()'s retry case, e.g. tsleep(ncp, 0, "livelk", 1);
> (but I'm hoping that will not be necessary).
The patch didn't fix the race I trigger. I added a new diagnostic
message which is printed before we execute "goto again;" in cache_inval.
The following log shows pretty clearly what's happening.
The dir structure is as I said before /afs/su.se/home/r/n/rnyberg/TEST/....
Feb 10 12:20:21 doozer kernel: invalidnode: cache scrapped (r)
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
Feb 10 12:20:21 doozer kernel: invalidnode: cache scrapped (home)
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
. .. and so on, and so forth.
nnpfs prints "cache scrapped(%s)" before calling cache_inval_vp.
It succeeds in scrapping the r directory, but races on home.
-Richard
More information about the Kernel
mailing list