namecache coherency, 2nd turn

Wed Feb 8 00:21:13 PST 2006

On 2006-02-07, Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx> wrote:
>:   - Unnecessary: shadow associatons can just be finely torn where they
>:     were created: in the resolver routine. Propagating unresolution will
>:     yield control back to the resolver routine, even if group topology is
>:     left intact; and then the resolver can start its work by tearing
>:     the link so that staleness (eg., in rename) will be avoided.
>
>     Ok.  In order for this to work you have to make sure that a lock
>     contender does the right thing when it is torn down.  e.g. lets
>     say you have:
>
>      A  -> B
>     (L)
>
>     Where A holds the lock and is currently being resolved, and B is blocked
>     trying to get the lock.
>
>     During the resolution operation, the resolver decides to tear down
>     A->B.
>
>     What happens to B which is currently blocked on A's lock ?  Or, for
>     that matter, what happens to A if the lock structure resides in B?

Yes, with my implementation the lock structure would reside in B.
When you do the A -> B cut, the group is locked, so after the cut B will
be in a locked state with in a natural way. Furthermore, my code makes
sure that the locking stuff in A is set to "locked" before the cut, so
when the cut takes place, A will end up locked, too.

Interface-wise it would be a cache_shadow_detach(A) call, which would
then return B. If the caller doesn't need B she can just cache_put() it,
making it accessible for contenders. (Actually, I always do it
like "cache_put(cache_shadow_detach(ncp))", as I never wanted to do
anything with the "B" part [maybe the cache_put() call should be integrated
into cache_shadow_detach()...]).

>     What my worry is here is that such a detachment can result in B
>     accessing a non-existent lock if the A->B association gets torn down
>     and A is then destroyed (free()'d).

After cache_put(cache_shadow_detach(A)) no trace of A will remain at
B...

I added a "cache_put(cache_shadow_detach(ncp))" to cache_zap() too, for the
case when A is destroyed with the association still in place.

>     This is why I wanted a shadowinfo structure to hold the lock, instead
>     of having the lock be specifically embedded in one of the namecache
>     records.

Upon splitting a group, we can do a "lock migration" to set up lock
locations/states for all parties as it's desired. We *can* do it, be the
"shadowinfo" embedded or not. Moreover, we'll *have* to do it as well,
and I don't see why a separate shadowinfo would ease this procedure.

>     I may have misspoke when I talked about that in my last
>     posting on the topic.  The lock really does have to be an independant
>     structure if there are any associations at all.  It can only be 
>     embedded if the namecache record is entirely alone, with no associations
>     whatsoever.

Given that we take the effort to allocate this data with namecache
structures anyway, why would we want to bother with separate
malloc / initializer / destructor / refcounting / gc ? If these
operations were trivialized with eg. keeping a canonical reference to a
shadowinfo, which exists before and after all other references, we end
up very close to the "always embedded" case, just we'll have a malloc
overhead too...

>:     This impiled a very funky locking interface: eg, a typical
>:     null_nfoo() looked somehow like something like
>:
>:	...
>:        ap->a_ncp = ncp->nc_shadowed;
>:	cache_lock(ap->a_ncp);
>:	vop_nfoo_ap(ap);
>:	cache_unlock(ap->a_ncp);
>
>     I think this would be a bit less complicated with a shadowinfo
>     structure containing the lock which is independant of the namecache
>     records.

Yes, with the "always embedded" approach it's more problematic to
organize the location of per-entry fallback info. But let just get over
such subtleties and assume we provide a superb bookkeeping service, this
way or that.

With the unstable model, it still remains that we would have to maintain
these fallback infos from the API user side. That's the real cruft. That's
why we'd need the cache_[un]lock in the above code snippet.

Instead of seeking ways for making the bookkeeping rock, isn't it
better to seek the way of getting rid of it all? (Given the latter is
possible, which seems to be the case.)

>:     All in all, the unstable approach ended up in a programming model I
>:     couldn't make consistent. Each fixed panic resulted in seven new
>:     one, like when fighting the hydra :) With the stable approach,
>:     post-split state synchronization is always a local issue which
>:     can be handled trivially. With that, causes of panics were just
>:     slightly more than typos.
>
>     I'm still studying the code, but I agree that both approaches are
>     going to have similar issues with teardowns and merges.  My biggest
>     concern is simply that a contended lock does not get accidently 
>     destroyed while there are still entities blocked on it.

I hope my above reply block will alleviate these concerns.

As I understand the namecache locking model, iterated locking is
allowed, but only in a restricted way, for short local periods. That is,
you can do

    cache_lock(x);
    cache_lock(x);
    cache_lock(x);
    i++;
    cache_unlock(x);
    cache_unlock(x);
    cache_unlock(x);

(such things might be useful if you are unsure about the lock state of
some pieces), but you should *never* do 

    cache_lock(x);
    cache_lock(x);
    cache_lock(x);
    big_scary_external_function_call(foo);
    cache_unlock(x);
    cache_unlock(x);
    cache_unlock(x);

As long as this perception is right, my code shouldn't have problems
with leaving around stale locks.

>:   - Renaming. I rewrote cache_rename() for two reasons:
>:   
>:      - I felt it's not deadlock safe: cache_inval(tncp) is called
>:        with fncp locked, and cache_inval does many locking on and off.
>:	I didn't see any constraint which could make us relaxed wrt.
>:	a deadlock on eg. fncp and some child of tncp... maybe it's just my
>:	short-sightedness.
>
>     This should be ok because fncp will not be a child of tncp, or 
>     vise-versa.  This is a requirement of the rename and there is
>     code to specifically check for and disallow the case.

Hm, it works as long as all of the "lock two entry" functions lock
parent before child. Which might be true for the original code.

But the group attach code also needs to lock two entries at a time, and
it doesn't have assumptions regarding parent/child relations. Should it
have? Of course, ideally the overlay hierarchy should be kept orthogonal
to the the file tree hiearchy, but there is nothing in the code to make
this explicit...

>     The only thing left after that to prevent a deadlock is to simply
>     ensure that fncp and tncp are properly ordered.  The order need only
>     be consistent, which can be done with a simple pointer comparison. 
>     Something like this:
>
>     if (fncp < tncp) {
> 	lock fncp
> 	lock tncp
>     } else {
> 	lock tncp
> 	lock fncp
>     }
>
>     Theoretically, anyway.

I have an aux routine (cache_lock_two) which just does this, and
both of my cache_rename() and cache_shadow_attach() functions
build on it.

Regards,
Csaba