panic in TCP Limited Transmit after RTO
Noritoshi Demizu
demizu at dd.iij4u.or.jp
Sat Mar 12 04:40:55 PST 2005
Hi,
(If this is not the correct mailing list for this kind of report,
please let me know. Thank you.)
I am observing the TCP SACK behavior of DragonFlyBSD these days.
Today, I experienced "sent too much" panic in sys/netinet/tcp_input.c
The scenario I observed was as following:
1. The sender starts slow start and congestion window grows rapidly.
2. Soon, a router queue overflows and many packets are lost.
3. Retransmission timer is expired. The sender starts slow start again.
Note: At this point, snd_nxt becomes much lower than snd_max
In my case, the differene was approximately 500KB.
4. The sender happens to receive a duplicate ACK.
And it enters the Limited Transmit code.
(Line 1956 through 2001 of tcp_input.c rev 1.54)
4-0. (Since this is the first Limited Retransmit call, t_dupacks = 1)
4-1. ownd = snd_max - snd_una (= about 500KB) at L.1959
4-2. Since t_dupacks = 1, snd_limited = 0 at L.1968.
4-3. snd_cwnd = ownd + MSS (= about 500KB) at L.1969
4-4. tcp_output() is called at L.1972
It sends so many data segments thanks to large snd_cwnd.
4-5. sent = snd_max - oldsndmax (= number of SACKed bytes, I guess)
In my case, sent was more than 30KB.
4-6. Since sent > t_maxseg, the condition in KASSERT() is examined
and panic() is called.
I think there are two reasons that cause "sent too much" panic.
o At step 4-1, outstanding window is estimated as snd_max - snd_una.
I think snd_nxt should be used instead of snd_max. Otherwise,
snd_cwnd becomes too large in some cases.
o At steps 4-1 and 4-5, calculations ignore SACKed bytes.
So, I think:
o Outstanding window should be calculated as
ownd = (snd_nxt - snd_una) - "SACKed_bytes_below_snd_nxt".
o Sent bytes should be calculated as
sent = (new snd_nxt - old snd_nxt)
- "SACked bytes between new snd_nxt and old snd_nxt".
where "new snd_nxt" means tp->snd_nxt value after tcp_output() is called
and "old snd_nxt" means tp->snd_nxt value before tcp_output() is called.
The following patch works fine for my case.
Thanks.
Regards,
Noritoshi Demizu
Index: sys/netinet/tcp_input.c
===================================================================
RCS file: /home/src/os/DragonFlyBSD-cvsup/dcvs/src/sys/netinet/tcp_input.c,v
retrieving revision 1.54
diff -u -r1.54 tcp_input.c
--- sys/netinet/tcp_input.c 9 Mar 2005 06:57:29 -0000 1.54
+++ sys/netinet/tcp_input.c 12 Mar 2005 11:27:21 -0000
@@ -1954,9 +1954,9 @@
(tp->t_dupacks - tp->snd_limited);
} else if (tcp_do_limitedtransmit) {
u_long oldcwnd = tp->snd_cwnd;
- tcp_seq oldsndmax = tp->snd_max;
- /* outstanding data */
- uint32_t ownd = tp->snd_max - tp->snd_una;
+ tcp_seq oldsndnxt;
+ uint32_t ownd; /* outstanding data */
+ int old_nsacked, new_nsacked;
u_int sent;
#define iceildiv(n, d) (((n)+(d)-1) / (d))
@@ -1966,12 +1966,19 @@
("dupacks not 1 or 2"));
if (tp->t_dupacks == 1)
tp->snd_limited = 0;
+ oldsndnxt = tp->snd_nxt;
+ old_nsacked = tcp_sack_bytes_below(&tp->scb,
+ oldsndnxt);
+ ownd = (oldsndnxt - tp->snd_una) - old_nsacked;
tp->snd_cwnd = ownd +
(tp->t_dupacks - tp->snd_limited) *
tp->t_maxseg;
tcp_output(tp);
tp->snd_cwnd = oldcwnd;
- sent = tp->snd_max - oldsndmax;
+ new_nsacked = tcp_sack_bytes_below(&tp->scb,
+ tp->snd_nxt);
+ sent = (tp->snd_nxt - oldsndnxt)
+ - (new_nsacked - old_nsacked);
if (sent > tp->t_maxseg) {
KASSERT((tp->t_dupacks == 2 &&
tp->snd_limited == 0) ||
More information about the Bugs
mailing list