mbuf leak

Fri Apr 16 10:17:25 PDT 2004

I'm seeing on several machines a leak of mbufs.  This started the middle of 
last week.  It was much worse last week than it is now; apparently a set 
of commits to tcp_* on Monday fixed part (the gradual part) of the problem.  
Now things run steadily for a day or two and *suddenly* run out of mbufs no 
matter what I set nmbclusters to.
(the last one on the DNS server, mbufs were 
---------------
73/224/40000 mbufs in use (current/peak/max):
        72 mbufs allocated to data
        1 mbufs allocated to packet headers
64/118/10000 mbuf clusters in use (current/peak/max)
292 Kbytes allocated to network (0% of mb_map in use)
----------------
for several days, then in 2 minutes,
----------------
2530/2544/40000 mbufs in use (current/peak/max):
        2530 mbufs allocated to data
2487/2488/10000 mbuf clusters in use (current/peak/max)
5612 Kbytes allocated to network (18% of mb_map in use)
---------------
(These two log entries were 2 mins apart...  I could change the cron job
to every minute, I guess.)

I looked through the commit logs for most of last week and didn't see anything
obvious related...

I have log_in_vain on and there is NO flurry of activity there, nor is
there any appearance of a syn flood in netstat -an.  A frag or related
attack is possible as it probably would not show in either of those places.
Was there any change to the net stack last week that would cause frags to 
not get returned to the mbuf pool?

The two machines that show this are in different places on VERY different
networks.  One is a DNS and secondary mail server (this one is running out
of mbufs faster now) with a 100mhz backbone connection.  The other is 
acting as a router for 5 other hosts with a DSL line running to them 
(effective 2mb/384k; the dsl is 6mb/384k but the router at the other end 
has CAR limiting to 2mb.)  That one has not showed the leak since Tuesday; 
the dns machine has twice.  (I have a cron script that logs various netstat's 
and reboots if the cluster max gets too high).

My SMB machine that is otherwise slightly flaky has never shown this problem.
However, it is behind the router machine so if there were a frag or such
attack that caused packets to get lost in the router, the SMB would never
see them.  The dns server is not behind anything limiting but a packeteer.
After this started happening a graph was added to the packeteer for my 
IP so it may be possible to look for incoming bursts but the logs I have 
kept make it appear VERY sudden and the packeteer graphs work like mrtg 
at 5min intervals.

I may add ipfw list to these logs but in the past it didn't show anything 
too useful.

I'm doing a system build on the dns machine as I write this; we'll see if
any commits since Wed morning (the last previous build) help.

As I said, it broke drastically mid-last-week, then was partially fixed on
Monday this week.

-- Pete