git: dsched_fq - Overhaul locking

Alex Hornung ahornung at gmail.com
Mon Apr 19 00:25:36 PDT 2010


:     Yah, I got one of those deadlocks too.  I tried turning on dsched
:     on my test box while running the corruption test and it locked up
:     (no ddb, no dump, no nothing... complete freeze) fairly quickly.
: 
:     I will try again on Monday w/ the new patches.

I've not been able to reproduce these deadlocks myself. The patch is pretty
much a blind attempt at fixing it by just reviewing the locking code. 

In short, I pretty much have to rely on others trying it out and seeing if
they can still hit the deadlock, so any testing is welcome.


:     I will note one thing with regards to dereferencing dpriv->recount.
:     Handling the 1->0 transition without a lock implies that no new
:     references can be obtained on the object while it is being
: deallocated.
:     (That is, no race can occur against the 1->0 transition of the
: refcount
:     while the lock is NOT held).  For example, by other routines
: accessing
:     the object from its lists.  Is that true?

That is most definitely the case. In any case if something was to go wrong
along these lines, an assert in the referencing code would be hit
(fq_reference_*priv), as it checks to make sure the object isn't in
destruction (i.e. assert on refcount >= 0).

Cheers,
Alex






More information about the Commits mailing list