system semi-freezes on mbuf cluster limit

Wed May 23 09:32:01 PDT 2007

:I just experienced a nasty situation:  I ran out of mbuf clusters (6656) =
:and ppp was one of the processes stuck in objcache_get.
:
:even after some clusters drained (from netstat -m output), the objcache d=
:epot didn't get free entries back and ppp stayed stuck.  and of course be=
:cause of this no mbuf clusters were freed (ppp would have to transmit the=
:m, i guess).  I was doing some serious down/uploading at the moment.
:
:this should not happen, or at least more gracefully.
:
:cheers
:  simon

    I was debugging something similar earlier this month.  Basically
    what can happen is that if a machine is running a lot of simultanious
    TCP connections, particularly outgoing connections which may build up
    a lot of data in the socket buffers, the machine can hit its mbuf
    cluster limit.

    Is that what is happening to you?  Lots of outgoing tcp connections
    with lots of data backed up (netstat -tn | fgrep tcp4)?  I want to
    make sure it isn't an mbuf leak.

    When the cluster limit is reached, the sheer demand for packets
    prevents the system from being able to recover mbufs.  Eventually the
    tcp connections start timing out and freeing all of their mbufs, and
    the machine then recovers.

    At the moment the only real solution is to increase the number of mbufs
    as boot time (set kern.ipc.nmbclusters and kern.ipc.nmbufs in 
    /boot/loader.conf).

    One thing that would be nice would be to have some sort of algorithm,
    similar to what linux has, where it detects the mbuf load on the system
    and reduces the amount of data it allows the tcp connections to build
    up dynamically, resulting in more graceful degradation.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>