HAMMER lockup

Matthew Dillon dillon at apollo.backplane.com
Mon Jun 30 12:46:40 PDT 2008

:>     HAMMER is reserving space in the strategy_write() code and must
:> also allocate a memory-record to placemark the operation.  This means
:> HAMMER must do various getblks and breads.  The buffer cache is
:> supposed to have enough clean buffers to satisfy those operations but
:> sometimes it doesn't.
:> 					-Matt
:> 					Matthew Dillon 
:> 					<dillon at backplane.com>
:(you forgot CC'ing users@)
:Thanks, I'll try it out as soon as you commit it.
:By the way, for some reason rtorrent is a great filesystem testing
:application. There has been an ext3 bug caught with it and I also
:caught a bug in FreeBSD's ZFS implementation with rtorrent, and now
:HAMMER... :-D
:Gergo Szakal MD <bastyaelvtars at gmail.com>

     If you could email me your rtorrent rc and a config file I can use
     to test with I'd appreciate it.

     I'm making good progress.  I've fixed another 3 deadlocks in my
     local tree and I will commit them tonight.  I can't commit them now
     because one of the fixes also involved a major rewrite of the low
     level storage allocator.  The media format is still the same, but
     I had to carefully reorder the way the blockmap lock is handled
     and the change is too dangerous to commit without at least a good
     day's worth of testing.

     I was scratching my head wondering how, with all the work I have done,
     the buffer cache could STILL get stuck in "newbuf".  Turns out I was
     chasing my tail.  I had changed HAMMER's VOP_WRITE last week to not
     block if there were too many dirty buffers in the buffer cache
     when called from the pageout daemon.  The idea was that not blocking
     would prevent HAMMER from deadlocking the pageout daemon.  The
     result was that the pageout daemon happily queued out so many dirty
     pages that the buffer cache filled up with dirty buffers and
     deadlocked against other processes trying to read data from disk
     instead.  HAMMER needs to be able to issue I/O reads in order to
     reserve the space needed for the writes so, boom, it deadlocked.

     So now I've fixed that, but it means I have to deal with potential
     vnode deadlocks.  If I focus on getblk()/bread() not getting stuck
     in "newbuf" that should break the chain reaction.  VOP_READ will
     not get stuck, then.  But there still may be cases where a kmalloc()
     gets stuck holding a vnode lock which then prevents the pageout 
     daemon from being able to page-out pages from that vnode.

     It's a big merry-go-round involving careful attention to what
     locks are needed for what operation.  I feel like I've been working
     on this problem for 10+ years now :-(.

					Matthew Dillon 
					<dillon at backplane.com>

More information about the Users mailing list