panic in TCP Limited Transmit after RTO

Noritoshi Demizu demizu at dd.iij4u.or.jp
Sat Mar 12 04:40:55 PST 2005


Hi,

 (If this is not the correct mailing list for this kind of report,
  please let me know.  Thank you.)

I am observing the TCP SACK behavior of DragonFlyBSD these days.
Today, I experienced "sent too much" panic in sys/netinet/tcp_input.c

The scenario I observed was as following:

  1. The sender starts slow start and congestion window grows rapidly.

  2. Soon, a router queue overflows and many packets are lost.

  3. Retransmission timer is expired.  The sender starts slow start again.
     Note: At this point, snd_nxt becomes much lower than snd_max
           In my case, the differene was approximately 500KB.

  4. The sender happens to receive a duplicate ACK.
     And it enters the Limited Transmit code.
     (Line 1956 through 2001 of tcp_input.c rev 1.54)

	4-0. (Since this is the first Limited Retransmit call, t_dupacks = 1)
	4-1. ownd = snd_max - snd_una (= about 500KB) at L.1959
	4-2. Since t_dupacks = 1, snd_limited = 0 at L.1968.
	4-3. snd_cwnd = ownd + MSS (= about 500KB) at L.1969
	4-4. tcp_output() is called at L.1972
	     It sends so many data segments thanks to large snd_cwnd.
	4-5. sent = snd_max - oldsndmax (= number of SACKed bytes, I guess)
	     In my case, sent was more than 30KB.
	4-6. Since sent > t_maxseg, the condition in KASSERT() is examined
	     and panic() is called.

I think there are two reasons that cause "sent too much" panic.

  o At step 4-1, outstanding window is estimated as snd_max - snd_una.
    I think snd_nxt should be used instead of snd_max.  Otherwise,
    snd_cwnd becomes too large in some cases.

  o At steps 4-1 and 4-5, calculations ignore SACKed bytes.

So, I think:

  o Outstanding window should be calculated as

    ownd = (snd_nxt - snd_una) - "SACKed_bytes_below_snd_nxt".

  o Sent bytes should be calculated as

    sent = (new snd_nxt - old snd_nxt)
           - "SACked bytes between new snd_nxt and old snd_nxt".

    where "new snd_nxt" means tp->snd_nxt value after tcp_output() is called
    and "old snd_nxt" means tp->snd_nxt value before tcp_output() is called.

The following patch works fine for my case.

Thanks.

Regards,
Noritoshi Demizu


Index: sys/netinet/tcp_input.c
===================================================================
RCS file: /home/src/os/DragonFlyBSD-cvsup/dcvs/src/sys/netinet/tcp_input.c,v
retrieving revision 1.54
diff -u -r1.54 tcp_input.c
--- sys/netinet/tcp_input.c	9 Mar 2005 06:57:29 -0000	1.54
+++ sys/netinet/tcp_input.c	12 Mar 2005 11:27:21 -0000
@@ -1954,9 +1954,9 @@
 					    (tp->t_dupacks - tp->snd_limited);
 			} else if (tcp_do_limitedtransmit) {
 				u_long oldcwnd = tp->snd_cwnd;
-				tcp_seq oldsndmax = tp->snd_max;
-				/* outstanding data */
-				uint32_t ownd = tp->snd_max - tp->snd_una;
+				tcp_seq oldsndnxt;
+				uint32_t ownd; /* outstanding data */
+				int old_nsacked, new_nsacked;
 				u_int sent;
 
 #define	iceildiv(n, d)		(((n)+(d)-1) / (d))
@@ -1966,12 +1966,19 @@
 				    ("dupacks not 1 or 2"));
 				if (tp->t_dupacks == 1)
 					tp->snd_limited = 0;
+				oldsndnxt = tp->snd_nxt;
+				old_nsacked = tcp_sack_bytes_below(&tp->scb,
+								   oldsndnxt);
+				ownd = (oldsndnxt - tp->snd_una) - old_nsacked;
 				tp->snd_cwnd = ownd +
 				    (tp->t_dupacks - tp->snd_limited) *
 				    tp->t_maxseg;
 				tcp_output(tp);
 				tp->snd_cwnd = oldcwnd;
-				sent = tp->snd_max - oldsndmax;
+				new_nsacked = tcp_sack_bytes_below(&tp->scb,
+								   tp->snd_nxt);
+				sent = (tp->snd_nxt - oldsndnxt)
+					- (new_nsacked - old_nsacked);
 				if (sent > tp->t_maxseg) {
 					KASSERT((tp->t_dupacks == 2 &&
 						 tp->snd_limited == 0) ||






More information about the Bugs mailing list