net.inet.tcp.inflight_enable
Matthew Dillon
dillon at apollo.backplane.com
Tue Jul 12 10:12:11 PDT 2005
:In my understanding, BDP limiting tries to estimate bandwidth-delay
:product and tries to avoid injecting too much data segments into
:networks. If so, the right points to evaluate the implementation
:of BDP limiting would be as following:
:
The algorithm is described in /usr/src/sys/netinet/tcp_subr.c, starting
at around line 1783. It basically calculates the bandwidth delay
product by taking the minimum observed RTT and multiplying it against
the observed bandwidth. It's virtually impossible to calculate it any
other way, because most of the parameters are unstable and would cause
a positive feedback loop in the calculation to occur (== wildly unstable
calculation).
The expansion window is supposed to continue to operate due to the slop
added to the BDP calculation, but clearly it isn't working as well as
it could. I think the issue is simply that it takes a while for the
bandwidth calculation to average up.
There are a number of possible solutions here, including storing the
bandwidth in the route table so later connections can start from the
last observed bandwidth rather then from 0. Another way would be to keep
track of the number of bandwidth calculations that have occured and
instead of averaging 1/16 in on each iteration the first few samples
would be given a much bigger piece of the pie.
Here is a patch that implements the second idea. See if it helps.
-Matt
Index: tcp_subr.c
===================================================================
RCS file: /cvs/src/sys/netinet/tcp_subr.c,v
retrieving revision 1.49
diff -u -r1.49 tcp_subr.c
--- tcp_subr.c 2 Jun 2005 23:52:42 -0000 1.49
+++ tcp_subr.c 12 Jul 2005 16:58:22 -0000
@@ -1845,6 +1845,7 @@
if (!tcp_inflight_enable) {
tp->snd_bwnd = TCP_MAXWIN << TCP_MAX_WINSHIFT;
tp->snd_bandwidth = 0;
+ tp->snd_bandwidth_counter = 0;
return;
}
@@ -1857,8 +1858,6 @@
if (tp->t_bw_rtttime == 0 || delta_ticks < 0 || delta_ticks > hz * 10) {
tp->t_bw_rtttime = ticks;
tp->t_bw_rtseq = ack_seq;
- if (tp->snd_bandwidth == 0)
- tp->snd_bandwidth = tcp_inflight_min;
return;
}
if (delta_ticks == 0)
@@ -1881,7 +1880,13 @@
bw = (int64_t)(ack_seq - tp->t_bw_rtseq) * hz / delta_ticks;
tp->t_bw_rtttime = save_ticks;
tp->t_bw_rtseq = ack_seq;
- bw = ((int64_t)tp->snd_bandwidth * 15 + bw) >> 4;
+ if (tp->snd_bandwidth_counter < 15) {
+ bw = ((int64_t)tp->snd_bandwidth * tp->snd_bandwidth_counter +
+ bw) / (tp->snd_bandwidth_counter + 1);
+ ++tp->snd_bandwidth_counter;
+ } else {
+ bw = ((int64_t)tp->snd_bandwidth * 15 + bw) >> 4;
+ }
tp->snd_bandwidth = bw;
Index: tcp_var.h
===================================================================
RCS file: /cvs/src/sys/netinet/tcp_var.h,v
retrieving revision 1.35
diff -u -r1.35 tcp_var.h
--- tcp_var.h 10 May 2005 15:48:10 -0000 1.35
+++ tcp_var.h 12 Jul 2005 16:54:02 -0000
@@ -258,6 +258,7 @@
u_long t_badrxtwin; /* window for retransmit recovery */
u_long t_rexmtTS; /* timestamp of last retransmit */
u_char snd_limited; /* segments limited transmitted */
+ u_char snd_bandwidth_counter; /* initial bandwidth calculations */
tcp_seq rexmt_high; /* highest seq # retransmitted + 1 */
tcp_seq snd_max_rexmt; /* snd_max when rexmting snd_una */
More information about the Kernel
mailing list