a take at cache coherency

Matthew Dillon dillon at apollo.backplane.com
Tue Jan 24 10:41:56 PST 2006

    These are all very good ideas.  I do have to agree with Simon re:
    the original patch.  Its a bit too much of a hack.  I still believe
    that nearly all the work should be in the cache_lock/unlock code 
    and that most of the other cache code should work as it does now.
    Clearly resolution and invalidation are special cases, but I do not
    want the special cases to be visible to the VFS (insofar as it is 
    possible to shield the VFS from the layering).

    I will interject a few points:

    (1) We can't share vnode pointers.  We want the overlay system to have
	the flexibility of being able to translate data as well as namespace,
	even though nullfs doesn't need to do this now.

    (2) For renames and other namespace operations, it should be sufficient
	to simply INVALIDATE the upper layers rather then attempt to 
	actually adjust them to perfection.

	This could be particularly important for something like rename if
	an upper layer needs to do some sort of filename translation.

	When an invalid entry is encountered by a normal operation, like
	a rename or open or something like that, it will be resolved.  So
	the work of resynchronizing the entries would be done by the

    (3) For the shadow group, I think all that really needs to be shared
	here is the 'lock'.  Definitely not the vnode pointer.  Even though
	entries are grouped together, the nc_vp is an integral part of
	the 'namecache' structure, not the shadow group.

	Even if it happens to wind up being the same in most cases the
	partial invalidation that must be done on parent namecache records
	(in the parent layers) means that some namecache records will be
	marked unresolved while other deeper layers will be marked
	resolved.  Currently a combination of nc_vp and nc_flags is used
	to determine this state.  It is very, very fragile and I don't
	want to change that code at all.

:We don't need to linearize them. We can use the same tree layout as in
:the fs hierarchy: pointers in one direction, list of neighbours in the
:other direction.
:If we had such a flattened version: we would either have it as a
:substitute for the tree layout, or as an addendum. (I don't see which
:one of these was suggested by you, so I discuss both.)
:Regarding the former: I don't think it would be sufficient. While the
:cache layer could get along happily with flat groups, overlays operate
:with direct "x shadows y" relations, which mans we have the tree.
:So, from this on I'll assume the latter: having the flat thing as an

    I would treat the VFS's as separate trees.  The only commonality
    is the 'shadow group' that correlates a set of entries from separate
    trees and indicates a relationship between them.

    That is, there are two distinct data structures here... there is the
    normal namespace tree, and then there is shadow group.  They must
    be considered separately.

:> - have a ncp_shadowroot
:> - have a ncp_shadowlink which maintains a single linked list to all 
:> other entries in the group
:> - this way we don't have to run through all entries every time when 
:> doing locking.  we just lock ncp->ncp_shadowroot->ncp_exlocks++.
:So then, the ncp_shadow{root,link} fields would serve as some kind of
:cache. With all the befenits and drawbacks of a cache: you can retrieve
:some information immediately, but you have to take care about the
:validity of cached information.

    I'm a little confused over 'shadowroot' and 'shadowlink'.  How about
    'shadowinfo' and 'shadowlink' ?  The shadowinfo would be a common
    structure that all entries in a particular shadow group would refer
    to, which would contain the lock.  shadowlink would be the topological
    glue to link the namecache entries from the disparate trees which are
    in the same shadow group together.

    Shadowlink could very easily just be a CIRCULAR singly linked list.
    If the only operation that needs to be done on parents is an 
    invalidation, and the lock is shared, then one can simply iterate
    forwards until one wraps to the highest layer, and continue iterating
    invalidating namecache records until the original entry is encountered

:> time	thread1			thread2
:> 1	lock null		lock lower
:> 2	resolve null (blocks)	lock null (blocks)
:> we have a deadlock.  resolving null means having to lock lower, but this 
:> fails as another thread already obtained the lock.
:Locking is a group operation. Therefore it must be group atomic -- eg.,
:locking the group can't mean obtaining a lock (in the old sense) on each
:entry simultaneously.
:There must be one distinguished parameter in the group which carries
:locked/unlocked state. In my approach it's the nc_exlocks field of the
:group head. If a thread has that bumped, it has obtained the lock on the
:whole group.
    Which would be the shared shadowinfo structure.

    In order to reduce confusion, the shadowinfo structure would exist
    for all namecache entries whether they are shadowed or not.  We 
    could simply embed an internal version in each namecache structure
    and have the shadowinfo pointer point to it as a degenerate case.

					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>

More information about the Kernel mailing list