a take at cache coherency

Csaba Henk csaba.henk at creo.hu
Tue Jan 24 22:22:15 PST 2006

On 2006-01-24, Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx> wrote:
>     These are all very good ideas.  I do have to agree with Simon re:
>     the original patch.  Its a bit too much of a hack.  I still believe
>     that nearly all the work should be in the cache_lock/unlock code 
>     and that most of the other cache code should work as it does now.
>     Clearly resolution and invalidation are special cases, but I do not
>     want the special cases to be visible to the VFS (insofar as it is 
>     possible to shield the VFS from the layering).

I have a version now which is free from these symptoms (with only
vfs_cache.c and namecache.h modified). However, compiling, debugging and
posting it would probably be a waste of time, because the rest of your
post redraws the picture.

(When I say "I do foo like this" I refer to this updated version, not the
first one which was posted.)

>     (2) For renames and other namespace operations, it should be sufficient
> 	to simply INVALIDATE the upper layers rather then attempt to 
> 	actually adjust them to perfection.
> 	This could be particularly important for something like rename if
> 	an upper layer needs to do some sort of filename translation.
> 	When an invalid entry is encountered by a normal operation, like
> 	a rename or open or something like that, it will be resolved.  So
> 	the work of resynchronizing the entries would be done by the
> 	resolver.

That's how I treat renames, although a simple invalidation (in the sense
of cache_inval) seems to be unsufficent for getting renaming right: shadowing
relations must be torn down and get recalculated too.

But if you want to follow this scheme in general, current code will make
that shipwreck, because there are cases when a positive action is done
without asking the vfs.

Eg., there are cache_setvp() calls popping up here and there. Now say
nullnode shadows foonode, and nullnode is unresolved. The plan is that
nullnode will sync itself up to foonode in null_nresolve (which will be 
resolved at that point, if for nothing else, because the null_nresolve
itself forwards the resolution request to foonode).

Now some function is invoked which has a cache_setvp() call in it, and
cache_setvp() happens to hit nullnode. It gets then resolved
mercilessly. foonode also gets resolved independently. They are now out
of sync despite the efforts of the null layer, because null_nresolve was
left out of the game.

So, to achieve "shadowability", either cache_setvp() calls should be
purged out from general vfs code or cache_setvp() itself has to propagate
its effect through the shadow chain.

There are issues with the shadowability of other things like fsmid, too.

>     (1) We can't share vnode pointers.  We want the overlay system to have
> 	the flexibility of being able to translate data as well as namespace,
> 	even though nullfs doesn't need to do this now.
>     (3) For the shadow group, I think all that really needs to be shared
> 	here is the 'lock'.  Definitely not the vnode pointer.  Even though
> 	entries are grouped together, the nc_vp is an integral part of
> 	the 'namecache' structure, not the shadow group.
> 	Even if it happens to wind up being the same in most cases the
> 	partial invalidation that must be done on parent namecache records
> 	(in the parent layers) means that some namecache records will be
> 	marked unresolved while other deeper layers will be marked
> 	resolved.  Currently a combination of nc_vp and nc_flags is used
> 	to determine this state.  It is very, very fragile and I don't
> 	want to change that code at all.

My idea was that the basic relation of cache coherency is "x shadows y",
in the sense that x wants all its resolution related data kept in sync
with y.

Now you say that the coherency layer doesn't even know of such things,
it's just concerned about locking; and that the exact way and extent of
shadowing should reside in vfs scope.

However, the namecache struct doesn't have fs specific private data
field (I suppose by design). How could then the fs associate shadowing
related info with a namecache node? If there is no strict "x shadows y"
relation but an overlay aspires to translate data, then having just a
nc_shadowed field is pretty pointless... (unless if it's of type "void
*", which we might not allow). Eg., unionfs might want then its
namecache nodes to shadow two lower nodes in some sense.

>:We don't need to linearize them. We can use the same tree layout as in
>:the fs hierarchy: pointers in one direction, list of neighbours in the
>:other direction.
>     I would treat the VFS's as separate trees.  The only commonality
>     is the 'shadow group' that correlates a set of entries from separate
>     trees and indicates a relationship between them.
>     That is, there are two distinct data structures here... there is the
>     normal namespace tree, and then there is shadow group.  They must
>     be considered separately.

I think everybody agrees with that, and noone wanted to intermingle fs
hierarchy with the overlay hierarchy in any way. I just said they could
be built from the same kind of abstract data structures.

>     I'm a little confused over 'shadowroot' and 'shadowlink'.  How about
>     'shadowinfo' and 'shadowlink' ?  The shadowinfo would be a common
>     structure that all entries in a particular shadow group would refer
>     to, which would contain the lock.  shadowlink would be the topological
>     glue to link the namecache entries from the disparate trees which are
>     in the same shadow group together.
>     Shadowlink could very easily just be a CIRCULAR singly linked list.
>     If the only operation that needs to be done on parents is an 
>     invalidation, and the lock is shared, then one can simply iterate
>     forwards until one wraps to the highest layer, and continue iterating
>     invalidating namecache records until the original entry is encountered
>     again.

What does "highest layer" mean? How do we have a hierarchy if we only
know of sharing the lock?


More information about the Kernel mailing list