namecache coherency, 2nd turn

Tue Feb 7 13:44:07 PST 2006

:Hi,
:
:Here is a revamp of the namecache coherency code (and nullfs).
:
:I tried to follow the ideas we sort of agreed about in the thread "a take at
:cache coherency", but real life wanted it differently sometimes...
:
:  See the patch itself at
:
:  http://leaf.dragonflybsd.org/mailarchive/submit/2006-02/msg00021.html
:
: - Locking: the code implements the ideas Corecode and Matt were
:   suggesting: groups are lockable via each member (O(1) locking); each
:...
:... (lots of good stuff) ...

: - Synchronization via groups: here is the greatest difference from what
:   we agreed in (or my understanding went astray the furthest here).
:
:   Basicly, the point is that groups are for synchronizing locks, and
:   that negative events (more concretely, unresolution) are propagated
:   upwards (and that's all: groups are used for nothing else).
:   But within that, there are two alternatives:
:
:   - Unstable groups: negative events blow up groups.
:   
:   - Stable groups: the namecache backend code *never* splits up groups
:     (except at the end of a node's life cycle, during zapping), and each
:     shadow association is expected to be torn at the layer where it was
:     made.
:
:   In http://leaf.dragonflybsd.org/mailarchive/kernel/2006-01/msg00117.html
:   Matt says:
:
:  % The shadow chain would NEVER be allowed to be stale.  It could be
:  % partially unresolved, but the linkages cannot ever be stale (i.e.
:  % cannot ever be incorrect).  This generally means that an unresolved
:  % node needs to be disconnected from the chain.  The re-resolution of
:  % the node would reconnect it.
:
:   -- from which I conclude that he'd go for unstable groups. (Although it's
:   true that he doesn't exactly specifies where should the disconnection
:   happen, so the contrary might as well be true).
:
:   I started going on to implement unstable groups, but ended up with
:   stable groups, claiming that unstable groups are both unnecessary and
:   are a dead end.
:
:   - Unnecessary: shadow associatons can just be finely torn where they
:     were created: in the resolver routine. Propagating unresolution will
:     yield control back to the resolver routine, even if group topology is
:     left intact; and then the resolver can start its work by tearing
:     the link so that staleness (eg., in rename) will be avoided.

    Ok.  In order for this to work you have to make sure that a lock
    contender does the right thing when it is torn down.  e.g. lets
    say you have:

     A  -> B
    (L)

    Where A holds the lock and is currently being resolved, and B is blocked
    trying to get the lock.

    During the resolution operation, the resolver decides to tear down
    A->B.

    What happens to B which is currently blocked on A's lock ?  Or, for
    that matter, what happens to A if the lock structure resides in B?

    What my worry is here is that such a detachment can result in B
    accessing a non-existent lock if the A->B association gets torn down
    and A is then destroyed (free()'d).

    This is why I wanted a shadowinfo structure to hold the lock, instead
    of having the lock be specifically embedded in one of the namecache
    records.  I may have misspoke when I talked about that in my last
    posting on the topic.  The lock really does have to be an independant
    structure if there are any associations at all.  It can only be 
    embedded if the namecache record is entirely alone, with no associations
    whatsoever.

:   - Dead end:
:   
:      ***
:      Feel free to skip this if you find the "unnecessary"
:      argument concvincing enough.
:      ***
:     
:     When blowing up a group, each part will acquire a self contained
:     locking state. It's the disconnector's responsibility to set those
:     corretly. It can't divine the necessary state without hints.
:     Therefore I didn't see other solution than keeping the local
:     locking data of each node up to date (even if it was ignored by the
:     cache_lock mechanism of groups), so that in case of the node
:     becoming disconnected and self contained, it will be just-in-time
:     equipped with the proper locking state.
:
:     This impiled a very funky locking interface: eg, a typical
:     null_nfoo() looked somehow like something like
:
:	...
:        ap->a_ncp = ncp->nc_shadowed;
:	cache_lock(ap->a_ncp);
:	vop_nfoo_ap(ap);
:	cache_unlock(ap->a_ncp);

    I think this would be a bit less complicated with a shadowinfo
    structure containing the lock which is independant of the namecache
    records.

:     All in all, the unstable approach ended up in a programming model I
:     couldn't make consistent. Each fixed panic resulted in seven new
:     one, like when fighting the hydra :) With the stable approach,
:     post-split state synchronization is always a local issue which
:     can be handled trivially. With that, causes of panics were just
:     slightly more than typos.

    I'm still studying the code, but I agree that both approaches are
    going to have similar issues with teardowns and merges.  My biggest
    concern is simply that a contended lock does not get accidently 
    destroyed while there are still entities blocked on it.

:   - Renaming. I rewrote cache_rename() for two reasons:
:   
:      - I felt it's not deadlock safe: cache_inval(tncp) is called
:        with fncp locked, and cache_inval does many locking on and off.
:	I didn't see any constraint which could make us relaxed wrt.
:	a deadlock on eg. fncp and some child of tncp... maybe it's just my
:	short-sightedness.

    This should be ok because fncp will not be a child of tncp, or 
    vise-versa.  This is a requirement of the rename and there is
    code to specifically check for and disallow the case.

    The only thing left after that to prevent a deadlock is to simply
    ensure that fncp and tncp are properly ordered.  The order need only
    be consistent, which can be done with a simple pointer comparison. 
    Something like this:

    if (fncp < tncp) {
	lock fncp
	lock tncp
    } else {
	lock tncp
	lock fncp
    }

    Theoretically, anyway.

:I'm a bit confused about the purpose of kernel@ and submit@, is it OK as
:I tend to do, discussion on kernel@, code on submit@ ? Or should I keep
:discussion on submit@ as well? 
:
:Csaba

    Either makes sense for the involved patches and discussion we are
    having.  You'll notice I posted my BUF/BIO patches to kernel at .

    submit@ is used more for one-off submissions or submissions with only
    short discussions.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>