cvs commit: src/sys/netinet tcp_input.c
dillon at apollo.backplane.com
Wed Apr 18 20:38:36 PDT 2007
:On Tue, 17 Apr 2007 10:28:04 -0700 (PDT)
:Matthew Dillon <dillon at crater.dragonflybsd.org> wrote:
:> The possible trigger is
:> running netstat -an on a machine very heavily loaded with 6000+
:> network connections.
:I ran netstat -an on a DF 1.8.1 proxy with a few hundred (1000 at most) connections and it did not crash, printed out all connections.
:Gergo Szakal <bastyaelvtars at gmail.com>
It is starting to make more sense. I think what is happening is that
a callout timer is getting held up long enough for the TCP state to
change radically, due to the huge netstat -an, whos data is being loaded
via a sysctl. I committed a fix for one related problem to HEAD but I
don't know if it is the one causing the crash. Another possibility is
that the callout code is not properly detecting when a callout gets
ripped out from under it after blocking on the big giant lock.
The larger the amount of information the sysctl has to load (i.e. the
more connections the box has active), the longer it holds onto the big
giant lock and the longer the callout gets stalled.
The *real* fix for the problem is probably to have the callout queue
a message to the TCP thread instead of issue a callback, which would
allow the network callouts to run without the big giant lock. That's
fairly involved work and not something I can focus on right now.
More information about the Commits