[issue1554] nfs stall

Matthew Dillon dillon at apollo.backplane.com
Thu Oct 8 19:49:17 PDT 2009


:New submission from Thomas Nikolajsen <thomas.nikolajsen at mail.dk>:
:
:dfly 2.4.0
:
:'git gc' stalled (host didn't freeze) after some
:'nfs server .. not responding' / '.. is alive again'
:git repo on nfs mounted dir on local host.
:Other commands using nfs mount also stalls.
:
:Same experience with nfs mounted on remote host.
:(have seen this a few times over the last ~2 months,
:was guessing problem was network HW (although ping did respond))
:
:(did `shutdown' shortly before escape to debugger;
:it didn't seem to shutdown: returned to shell prompt;
:can do other core dump if needed)
:
:'git gc' did succeed using local fs (hammer) directly (no nfs).
:Can reproduce, as prev. state of git repo is in snapshot.
:
:Core dump *.39 uploading to leaf.

    You have a ton of NFS mounts here.  Hmm.  The NFS client is stuck
    waiting for a response from the NFS server (on the same host).  The
    NFS server (the nfsd's) are stuck in a vnode lock on the HAMMER
    filesystem waiting for the buffer cache.

    This looks like another HAMMER buffer cache exhaustion deadlock,
    again probably due to the 128M of ram in the machine.   However,
    it looks like a different issue then the one from your other
    bug report.

    I dug into why HAMMER was stalling in the core and it looked like
    it shouldn't be stalling.  HAMMER was only reserving one buffer.
    The bufdaemon and bufdaemon_hw are both in wdrn1 which implies
    they were flushing data to disk.

    It could be that the issue here is not an actual deadlock but simply
    a great deal of disk write activity causing long stalls in the
    system.  Did you notice a significant amount of hard drive activity
    while the system was in this state?  The only thing you are running
    is the 'git gc'.  It could be that write activity from the git gc
    is creating long stalls and causing NFS to report the problem.

    If that is the case the issue is probably more one of HAMMER simply
    being massively inefficient due to the tiny buffer cache, but otherwise
    operating.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Bugs mailing list