net.inet.tcp.inflight_enable

Matthew Dillon dillon at apollo.backplane.com
Tue Jul 12 10:12:11 PDT 2005


:In my understanding, BDP limiting tries to estimate bandwidth-delay
:product and tries to avoid injecting too much data segments into
:networks.  If so, the right points to evaluate the implementation
:of BDP limiting would be as following:
:

    The algorithm is described in /usr/src/sys/netinet/tcp_subr.c, starting
    at around line 1783.  It basically calculates the bandwidth delay
    product by taking the minimum observed RTT and multiplying it against
    the observed bandwidth.  It's virtually impossible to calculate it any
    other way, because most of the parameters are unstable and would cause
    a positive feedback loop in the calculation to occur (== wildly unstable 
    calculation).

    The expansion window is supposed to continue to operate due to the slop
    added to the BDP calculation, but clearly it isn't working as well as
    it could.  I think the issue is simply that it takes a while for the
    bandwidth calculation to average up.

    There are a number of possible solutions here, including storing the
    bandwidth in the route table so later connections can start from the
    last observed bandwidth rather then from 0.  Another way would be to keep
    track of the number of bandwidth calculations that have occured and
    instead of averaging 1/16 in on each iteration the first few samples
    would be given a much bigger piece of the pie.

    Here is a patch that implements the second idea.  See if it helps.

						-Matt

Index: tcp_subr.c
===================================================================
RCS file: /cvs/src/sys/netinet/tcp_subr.c,v
retrieving revision 1.49
diff -u -r1.49 tcp_subr.c
--- tcp_subr.c	2 Jun 2005 23:52:42 -0000	1.49
+++ tcp_subr.c	12 Jul 2005 16:58:22 -0000
@@ -1845,6 +1845,7 @@
 	if (!tcp_inflight_enable) {
 		tp->snd_bwnd = TCP_MAXWIN << TCP_MAX_WINSHIFT;
 		tp->snd_bandwidth = 0;
+		tp->snd_bandwidth_counter = 0;
 		return;
 	}
 
@@ -1857,8 +1858,6 @@
 	if (tp->t_bw_rtttime == 0 || delta_ticks < 0 || delta_ticks > hz * 10) {
 		tp->t_bw_rtttime = ticks;
 		tp->t_bw_rtseq = ack_seq;
-		if (tp->snd_bandwidth == 0)
-			tp->snd_bandwidth = tcp_inflight_min;
 		return;
 	}
 	if (delta_ticks == 0)
@@ -1881,7 +1880,13 @@
 	bw = (int64_t)(ack_seq - tp->t_bw_rtseq) * hz / delta_ticks;
 	tp->t_bw_rtttime = save_ticks;
 	tp->t_bw_rtseq = ack_seq;
-	bw = ((int64_t)tp->snd_bandwidth * 15 + bw) >> 4;
+	if (tp->snd_bandwidth_counter < 15) {
+	    bw = ((int64_t)tp->snd_bandwidth * tp->snd_bandwidth_counter +
+		 bw) / (tp->snd_bandwidth_counter + 1);
+	    ++tp->snd_bandwidth_counter;
+	} else {
+	    bw = ((int64_t)tp->snd_bandwidth * 15 + bw) >> 4;
+	}
 
 	tp->snd_bandwidth = bw;
 
Index: tcp_var.h
===================================================================
RCS file: /cvs/src/sys/netinet/tcp_var.h,v
retrieving revision 1.35
diff -u -r1.35 tcp_var.h
--- tcp_var.h	10 May 2005 15:48:10 -0000	1.35
+++ tcp_var.h	12 Jul 2005 16:54:02 -0000
@@ -258,6 +258,7 @@
 	u_long	t_badrxtwin;		/* window for retransmit recovery */
 	u_long	t_rexmtTS;		/* timestamp of last retransmit */
 	u_char	snd_limited;		/* segments limited transmitted */
+	u_char	snd_bandwidth_counter;	/* initial bandwidth calculations */
 
 	tcp_seq	rexmt_high;		/* highest seq # retransmitted + 1 */
 	tcp_seq	snd_max_rexmt;		/* snd_max when rexmting snd_una */





More information about the Kernel mailing list