namecache: locks and races

Richard Nyberg rnyberg at it.su.se
Thu Feb 10 05:02:09 PST 2005


Resent to newsgroup instead of ml.

Thanks for the very informative answer.
The check on memory addresses had me pretty confused :)

At Wed, 9 Feb 2005 11:06:04 -0800 (PST),
Matthew Dillon wrote:

<snip informative answer>

>     --
>
>     Calling cache_inval_vp()... What you are doing is legal, but nasty.
>     Invalidating an entire directory hierarchy (which is what CINV_CHILDREN
>     will do) is not something you want to do if you can at all help it.
>     If there's no choice, though, there's no choice.  It is meant to work.
> 
>     It is not supposed to create a livelock however, but looking at the code
>     I can well see how this could occur.  I think what we need to do here
>     is the same thing that the main cache_inval() loop does, which is 
>     to keep its place in the loop rather then always take from the head
>     of the list.
> 
>     I have enclosed a patch that *may* fix the problem, please try it out.
>     It is NOT well tested.  If it does not fix the problem then try adding
>     a delay in cache_resolve()'s retry case, e.g. tsleep(ncp, 0, "livelk", 1);
>     (but I'm hoping that will not be necessary).

The patch didn't fix the race I trigger. I added a new diagnostic
message which is printed before we execute "goto again;" in cache_inval.
The following log shows pretty clearly what's happening.

The dir structure is as I said before /afs/su.se/home/r/n/rnyberg/TEST/....

Feb 10 12:20:21 doozer kernel: invalidnode: cache scrapped (r)
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
Feb 10 12:20:21 doozer kernel: invalidnode: cache scrapped (home)
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on r
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_inval: again on: home
Feb 10 12:20:21 doozer kernel: [diagnostic] cache_resolve: had to recurse on home
. .. and so on, and so forth.

nnpfs prints "cache scrapped(%s)" before calling cache_inval_vp.
It succeeds in scrapping the r directory, but races on home.

        -Richard






More information about the Kernel mailing list