token passing & IPIs

Matthew Dillon dillon at apollo.backplane.com
Sat Jul 19 01:10:16 PDT 2003


:	It's the lock-less SMP stuff that Linux is using.  The idea
:	originated from IBM's Paul McKenney; and Rusty Russel of Linux
:	Camp actual made it work for Linux 2.4.x and later versions [2]
:
:	It will be a good idea to do lock-less file system lookups,
:	since it is a lookup, there is no write to it, thus we can avoid
:	locks as much as we can.  So can we for device state tables, and
:	such.
:
:	They have seen a significant scalability gain according to the
:	results, and the basic theory behind how
:	Read-Copy-Update/Lock-Less works [1].
:
:	[1] - http://lse.sourceforge.net/locking/dcache/dcache.html
:	      http://www.rdrop.com/users/paulmck/rclock/intro/rclock_intro.html
:	      
:	[2] - http://lse.sourceforge.net/locking/rcupdate.html
:
:	Cheers.
:	
:		-- Hiten (hmp at xxxxxxxxxxx)

    Ok, I now understand read-copy-update.  It is a very interesting approach
    and, indeed, it is very similar to MESI but taylored to a strict SMP
    environment.  I don't think I like the synchronization methodology of
    waiting for all the cpus to go through a scheduling switch to synchronize
    the update, but it is certainly an innovative solution that avoids having
    to message the other cpus and it does allow a single 'shared' cache.

    I think the MESI model is a better abstraction because it can be used
    to manage a single shared cache as well as multiple independant (but
    coherent) caches.  The RCU method only works with a single shared cache
    topology.  But the problem remains: how to synchronize updates
    and invalidations.  In particular, how to update the globally shared
    linkages between cached objects.

    Hmm.  I think it's possible to do without creating stall points.  If
    the global cache looks something like this:

	A->B->C

    then when you want to replace B with Bnew you would obtain a mutex and
    then hang Bnew off a dedicated pointer in B, like this:

	A->B--->C
	   |    ^
	   V    |
	   Bnew-+

    So far so good.  Now in order to be able to change the A->B linkage
    to A->Bnew, which would complete the update, you have to be sure
    that all cpus see the B->new_ptr field as non-NULL.  Once you know
    they see B->new_ptr as non-NULL you can safely change A->B to A->Bnew.
    Other cpus will either traverse A->B->Bnew, or they will see
    A->Bnew, so it doesn't matter when they see the A->B to A->Bnew update.

    Then once you know that all cpus see A->Bnew instead of A->B, and
    that no other cpus are inside the old B any more, you can free the old B.

    So in this scheme we have two 'synchronization' points we must resolve.
    The first synchronization can be resolved with a combination of a
    memory barrier and making B->new_ptr a volatile field.  The second
    synchronization point can be resolved by delaying freeing the old B,
    which could work by counting thread switches until all the cpus have
    switched at least once... but it wouldn't have to stall, it could be
    entirely passive.

    I'm sure I can think of other ways of doing it that aren't as costly
    as the RCU algorithm which fit into both a single shared global cache
    MESI model and a multiple-independant-caches MESI model.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>





More information about the Kernel mailing list