git: dsched_fq - Overhaul locking

Matthew Dillon dillon at apollo.backplane.com
Sun Apr 18 10:10:27 PDT 2010


:    * NOTE: this is an _attempt_ to fix some unidentified deadlocks that have
:      been reported occasionally. While it shouldn't happen, be aware that
:      this might explode.
:    
:    Reported-by: Antonio Huete, Jan Lentfer
:
:Summary of changes:
: sys/dsched/fq/dsched_fq.h      |   22 ++++++++++++++--------
: sys/dsched/fq/dsched_fq_core.c |   17 ++++-------------
: 2 files changed, 18 insertions(+), 21 deletions(-)
:
:http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/aedbaf3ba5edf2f3c33496c3c542c8333943bca7
:
:-- 
:DragonFly BSD source repository

    Yah, I got one of those deadlocks too.  I tried turning on dsched
    on my test box while running the corruption test and it locked up
    (no ddb, no dump, no nothing... complete freeze) fairly quickly.

    I will try again on Monday w/ the new patches.

    I will note one thing with regards to dereferencing dpriv->recount.
    Handling the 1->0 transition without a lock implies that no new
    references can be obtained on the object while it is being deallocated.
    (That is, no race can occur against the 1->0 transition of the refcount
    while the lock is NOT held).  For example, by other routines accessing
    the object from its lists.  Is that true?

    If not you may want to recode it like this:

    for (;;) {
	int refcount = dpriv->refcount;

	if (refcount == 1) {
	    spin_lock_wr()
	    if (atomic_cmpset_int(&dpriv->refcount, 1, 0)) {
		... remove from lists ...
		... etc ...
		break;
	    }
	    ...
	    spin_unlock_wr()
	} else {
	    KKASSERT(refcount >= 0 || refcount <= -0x400);
	    if (atomic_cmpset_int(&dpriv->refcount, refcount, refcount-1))
		break;
	}
	/* retry */
    }

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Commits mailing list