Machine unresponsive with cache_lock: blocked on... message
    Matthew Dillon 
    dillon at apollo.backplane.com
       
    Sat Mar 13 18:34:03 PST 2010
    
    
  
:> :Warning: pmap_interlock 00010003
:> :Warning: pmap_interlock 00010003
:> :
:> :No cache_lock stuff this time.
:> 
:>     It looks like a deadlock between the VM system and HAMMER due to
:>     kmalloc() blocking during a low memory situation.  Your system
:>     was paging heavily at the time and was low on free memory.
:> 
:>     I have committed a change to master which should fix this particular
:>     issue.
:
:I got a new one.
:Same symptoms, many pmap_interlock messages on the console.
:
:System has been updated to
:DragonFly v2.5.1.960.g7a6ce-DEVELOPMENT #9: Thu Mar 11 10:22:12 CET 2010
:...
:-- 
:Francois Tigeot
    
    This one revealed a different and very interesting MP race which
    caused shared<->exclusive deadlock.
    What is happening is that two threads are competing for an exclusive
    lock on a HAMMER structure in order to load a data buffer from disk.
    This occurs in the middle of the B-Tree lookup code which relies on
    shared locks.  When I coded it up I assumed it would be safe to
    acquire an exclusive lock because there was no data loaded yet, so
    there would not have been other consumers.
    But I was wrong.  If two threads compete to load the data buffer
    then one thread will win and start using that buffer and surrounding
    buffers with shared locks while the other thread will be holding
    shared locks on other buffers while attempting to acquire an exclusive
    lock on the new buffer (which the first thread had already loaded the
    data for).   The result is a deadlock.
    The solution is to use an interlock instead of an exclusive lock to
    test whether the structure needs data loaded from the drive or not.
    This way if there is a race the second thread will not deadlock trying
    to get the interlock when other threads hold the structure shared.
    It is going to take me a day or two to test the fix for this.
					-Matt
					Matthew Dillon 
					<dillon at backplane.com>
    
    
More information about the Bugs
mailing list