Internet problem after recent rewrite of mbuf

Sarunas Vancevicius vsarunas at eircom.net
Tue Aug 10 02:05:37 PDT 2004


On 16:28, Mon 09 Aug 04, Matthew Dillon wrote:
> 
> :On 10:15, Mon 09 Aug 04, Matthew Dillon wrote:
> :>     I don't think it could be the 29 July commit, but it could
> :>     be one of the later ones.
> :
> :I backed out changes to 3 August when the new network code
> :went in, and that wasn't the problem.
> 
>     So you are saying it was something that went in after Aug 3 or are
>     you saying that it is something that went in on Aug 3 ?

Sorry, I mixed up the dates. I just went back to 02 August,
so the revisions are:

$DragonFly: src/sys/netinet/tcp_output.c,v 1.16 2004/07/17 20:31:31 hsu Exp $
$DragonFly: src/sys/netinet/tcp_input.c,v 1.32 2004/07/27 17:57:02 drhodus Exp $

And the problem is there. So it must be commits between 29
July and 02 August.

>     What is the cvs rev on the file /usr/src/sys/netinet/tcp_input.c and
>     /usr/src/sys/netinet/tcp_output.c of the last known working kernel?

A working kernel of 29 July:

$DragonFly: src/sys/netinet/tcp_input.c,v 1.32 2004/07/27 17:57:02 drhodus Exp $
$DragonFly: src/sys/netinet/tcp_output.c,v 1.16 2004/07/17 20:31:31 hsu Exp $

>     From examining your output the point where it really starts to stall
>     is here:
> 
> 21:21:04.752575 IP 194.125.183.39.3340 > 207.171.163.90.80: P 909:1389(480) ack 659 win 58400
> 21:21:06.949765 IP 194.125.183.39.3340 > 207.171.163.90.80: P 909:1389(480) ack 659 win 58400
> 21:21:07.300225 IP 207.171.163.90.80 > 194.125.183.39.3340: . ack 1389 win 6432
> 21:21:24.249690 IP 207.171.163.90.80 > 194.125.183.39.3340: . 659:2119(1460) ack 1389 win 6432
> 
>     What this seems to show is packet loss.  Your machine sends a packet but
>     does not get an ack.  Then it sends the packet again..  Then it gets an
>     ack, then 17 seconds later amazon sends you a data packet.
> 
>     What I believe is happening is that amazon in fact sent you a data packet
>     immediately, but the packet was lost, as were a number of retries until
>     17 seconds later a packet amazon sent actually made it through.
> 
>     So the question is what is causing the data loss?  Could it be serial
>     port buffer overflows?  Check your 'dmesg' output when things fail.
>     I suspect it is either something like that, which means that it will in
>     fact work with a later kernel 'sometimes' depending on how fast the
>     modem negotiates its connection.  Or something we did broke ppp.

Yes, I have sio0 overflows, and my pppd is taking a lot of
CPU time, like 80-90% (I reported this to you on IRC when I
first got dfly).

Aug 10 08:33:36 laserbeam /kernel: sio0: 1 more silo overflow (total 14)

But I don't experience the packet dropping problem with
earlier kernels. I have the sio overflows since 1.0release. 

>     What makes me suspect a serial port issue is that large packets are
>     clearly being dropped more often then small packets.
> 
>     Try reducing the baud rate at which you talk to your modem.  I suspect
>     you have it set to 115200.  Try 38400 and try 9600.
> 
> 						-Matt

Yeah, it was set to 115200, sio overflows still appear when
its set to 38400. But they don't appear on 9600.

My userland is a bit out of sync (29 July, with 02 and 09 August
kernels, so top doesn't work). I will rebuild userland later
to see if its sio overflows thats causing pppd to take 90%
of CPU time.

Thanks,

-- 
Sarunas Vancevicius





More information about the Bugs mailing list