lkwt in DragonFly

Sat Feb 7 14:03:10 PST 2004

On Sat, Feb 07, 2004 at 01:13:19PM -0800, Matthew Dillon wrote:
> :> RCU = Remote Copy Update, right? Just asking, so I can find the right
> :> paper to read more about it.
> :
> :Read-Copy-Update. It's a read-mostly algorithm that's used to cluster data
> :structures writes/changes, so that it minimizes bus traffic across processors.
> :
> :http://lse.sourceforge.net/locking/rcupdate.html
> :
> :What Matt is talk about is merging a kind of quiescience operation into dfBSD
> :tokens if I understand correctly.
> :
> :bill
> 
>     Right.  My new idea for the token API gives us a scheduler-level
>     serialization abstraction that works across the whole system.  It
>     would kinda be like having multiple 'MP' locks in that it would have
>     the same sort of side effects as an MP lock.
> 
>     The traditional serialization abstraction is the concept of a 'mutex',
>     which is what FreeBSD-5 uses.  Linux tries to avoid mutex overheads
>     by using RCU (anyone can read data without having to worry about it
>     being changed out from under, but modifications must be synchronized
>     across all cpus at a scheduler boundary).  Right now we try to avoid
>     mutex-like overheads through our per-cpu model (e.g. LWKT scheduler
>     and SLAB allocator) and through our per-cpu tokens.
> 
>     The problem with our current token abstraction is very similar to the
>     problem FreeBSD-5 has with its mutex abstraction.  In our case, a 
>     token can be 'lost' when a thread blocks or gets interrupted outside
>     of a critical section and the thread must carefully code re-acquisition
>     of potentially lost tokens because of this.  In FreeBSD-5's case a mutex
>     is not allowed to be held across a blocking operation at all and so
>     the use of mutexes must be carefully coded... adding significant complexity
>     to deeply nested APIs.  Mutexes also have lock ordering issues (deadlock
>     potential).
> 
>     My new token idea takes the original per-cpu idea and abstracts it
>     into a serialization model.  We could, in fact, even support 'reader'
>     and 'writer' refs to tokens in order to allow multiple readers to
>     hold the same token at the same time.  The key difference between 
>     the new token idea and, say, a standard lock, is that the token is
>     only in effect while the thread that acquired it is running.  When
>     the thread blocks or switches away the tokens that thread has acquired
>     can be borrowed by other threads (on the same or on other cpus) and
>     must be returned before the original thread can resume.  

   How do you determine whether another CPU holds the token when your
   thread on the current cpu wants/needs it?  Or do you always
   synchronously IPI the other cpu when you want a token for writing?

   If you always IPI for a write, then all accesses to data structures
   protected by tokens need to be classified into either reads or
   writes.

   What about if you have multiple CPUs holding a token for reading only
   and then one comes along and wants to have the token for writing?
   What exactly happens?  Do you have a way of figuring out whether
   another CPU holds the token for reading and an IPI is required, or do
   you always IPI everyone?

>     So, unlike mutexes or locks, a token only guarentees run-time 
>     serialization.  A token also has no deadlock issues and no lock
>     ordering issues at all.  Like mutexes, code that uses tokens must
>     be aware of potential blocking conditions (because atomicy is lost
>     across a blocking operation), but unlike mutexes the code that
>     uses tokens does not have to release the tokens across the blocking
>     operation.  I believe this to be a VERY IMPORTANT feature of the
>     new token design because it means we can (A) hold multiple tokens
>     across a blocking situation, (B) higher level procedures do not need
>     to pass token references to lower level procedures that 'might block',
>     they need only be aware that the lower level procedure might block,
>     and (C) the new token abstraction leaves open the possibility of 
>     major implementation optimizations because it devolves nearly into
>     a NOP on UP systems.  e.g. we can support per-cpu 'caching' of the 
>     token after it has been released, allowing us to avoid locked bus
>     cycles as well as memory barriers in the cache case.

  So this means you will have locked bus cycles for token acquisitions
  UNLESS the token was last held by the current CPU?  How do you
  determine whether the token is cached locally?  A critical section
  protecting the check of a flag so as to ensure an incoming IPI does
  not preempt your check?

>     So how does this tie into the RCU concept?  Simple, really... RCU
>     turns out to be a special case of the new token abstraction, that's
>     all.  In the new abstraction if you had a token T and every thread
>     in the system obtained 'read access' on T, then any thread which
>     then obtains 'write access' on T would be serialized against all
>     the threads with read access at the scheduler level.  If we wanted to
>     implement Linux's 'queued' RCU API then we would queue structural 
>     changes directly to a thread that obtains 'write access' on T
>     and this translates directly to updating the data during
>     quiescent-scheduler periods.
> 
>     I actualy don't like Linux's queued RCU API, I am just saying that
>     it could be implemented natively on top of the new token abstraction.
>     We would use our tokens in a more active manner.  e.g. instead of
>     holding a read token and queueing write udpates to another entity that
>     holds the write token we would obtain a write token and do the 
>     update in real time, then release the write token.  Nothing prevents
>     us from doing it both ways, though :-)
> 
>     So what does this mean?  This means that the new token abstraction
>     will not be a replacement for standard reader/writer (lockmgr) locks,
>     but it certainly would be an excellent replacement for mutexes
>     because it has far fewer side effects, requires less knowledge of
>     the lock state to be passed down the subroutine chain, and is far more
>     flexible.
> 
> 					-Matt
> 					Matthew Dillon 
> 					<dillon at xxxxxxxxxxxxx>

  While the benefits of not having to account for how many tokens (and
  how far up above your potentially deeply nested API) are held when
  blocking - not worrying about unwinding locks manually and rewinding
  post-sleep are great, can you perhaps clarify on what the typical
  costs for acquiring a token for reading and writing will be?  In most
  cases, from what you said, I'm expecting a typical acquisition to
  consist of a critical section at best (for a read or if the token is
  cached on the local cpu), sometimes a bus-locked instruction, and
  sometimes IPIs?  Pardon the ignorance, but not yet having looked at
  your implementation, it is still unclear to me as to what a typical
  token acquisition/deacquisition will look like in DragonFly.

Cheers,
-- 
Bosko Milekic  *  bmilekic at xxxxxxxxxxxxxxxx  *  bmilekic at xxxxxxxxxxx
TECHNOkRATIS Consulting Services  *  http://www.technokratis.com/