Machine unresponsive with cache_lock: blocked on... message
Matthew Dillon
dillon at apollo.backplane.com
Sat Mar 13 18:34:03 PST 2010
:> :Warning: pmap_interlock 00010003
:> :Warning: pmap_interlock 00010003
:> :
:> :No cache_lock stuff this time.
:>
:> It looks like a deadlock between the VM system and HAMMER due to
:> kmalloc() blocking during a low memory situation. Your system
:> was paging heavily at the time and was low on free memory.
:>
:> I have committed a change to master which should fix this particular
:> issue.
:
:I got a new one.
:Same symptoms, many pmap_interlock messages on the console.
:
:System has been updated to
:DragonFly v2.5.1.960.g7a6ce-DEVELOPMENT #9: Thu Mar 11 10:22:12 CET 2010
:...
:--
:Francois Tigeot
This one revealed a different and very interesting MP race which
caused shared<->exclusive deadlock.
What is happening is that two threads are competing for an exclusive
lock on a HAMMER structure in order to load a data buffer from disk.
This occurs in the middle of the B-Tree lookup code which relies on
shared locks. When I coded it up I assumed it would be safe to
acquire an exclusive lock because there was no data loaded yet, so
there would not have been other consumers.
But I was wrong. If two threads compete to load the data buffer
then one thread will win and start using that buffer and surrounding
buffers with shared locks while the other thread will be holding
shared locks on other buffers while attempting to acquire an exclusive
lock on the new buffer (which the first thread had already loaded the
data for). The result is a deadlock.
The solution is to use an interlock instead of an exclusive lock to
test whether the structure needs data loaded from the drive or not.
This way if there is a race the second thread will not deadlock trying
to get the interlock when other threads hold the structure shared.
It is going to take me a day or two to test the fix for this.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Bugs
mailing list