net.inet.tcp.inflight_enable

Noritoshi Demizu demizu at dd.iij4u.or.jp
Sat Jul 16 09:14:36 PDT 2005


Sorry for my late response.

>     The algorithm is described in /usr/src/sys/netinet/tcp_subr.c, starting
>     at around line 1783.  It basically calculates the bandwidth delay
>     product by taking the minimum observed RTT and multiplying it against
>     the observed bandwidth.  It's virtually impossible to calculate it any
>     other way, because most of the parameters are unstable and would cause
>     a positive feedback loop in the calculation to occur (== wildly unstable
>     calculation).

Of course, I've read those codes.  But I doubt that bandwidth can be
estimated accurately by the mothod.

I made another experiences and found that, if HZ is increased, the
throughput is improved.  Note that bandwidth=100Mbit/s=12.5Mbyte/s
and RTT=20ms in my experimental environment.  Here are results of
time to send 512KB data without losses.

	net.inet.tcp.inflight_enable = 0, HZ=100	167ms
	net.inet.tcp.inflight_enable = 1, HZ=100	1305ms
	net.inet.tcp.inflight_enable = 1, HZ=1000	398ms
	net.inet.tcp.inflight_enable = 1, HZ=10000	189ms
	(See http://www.demizu.org/~noritosi/memo/2005/0716/ )

In the second experience, HZ=100 (1/HZ = 10ms) and RTT=20ms.
In this case, the estimated bandwidth (tp->snd_bandwidth) at the end
of the transfer was about 530Kbyte/s, while it should be 12.5Mbyte/s.
http://www.demizu.org/~noritosi/memo/2005/0716/df-inflight-1000hz-log.html
(These values are stored in the kernel memory and printed out after
 all data has been transfered and acked.)

On the other hand, in the fourth experience, HZ=10000 (1/HZ = 0.1ms)
and RTT=20ms.  In this case, the estimated bandwidth at the end of the
transfer was about 12.9Mbyte/s.  It is close to the actual bandwidth.
http://www.demizu.org/~noritosi/memo/2005/0716/df-inflight-10000hz-log.html

As a result, if one uses BDP limiting, I think 1/HZ should be set to
much smaller value than RTT.

By the way, I observed a very good trace.  When HZ=10000 (1/HZ = 0.1ms)
and RTT=20ms, BDP limiting worked as expected.
See http://www.demizu.org/~noritosi/memo/2005/0716a/#good
(This is one of the best shots.  In fact, it often experienced packet
 losses due to router queue overflow.  Nevertheless, since congestion
 window did not grow too large, lost data were recovered quickly.
 Note that, without BDP limiting, since congestion window grows
 exponentially, it takes much time to recover lost data as shown in
 http://www.demizu.org/~noritosi/memo/2005/0716a/#off )


>     There are a number of possible solutions here, including storing the
>     bandwidth in the route table so later connections can start from the
>     last observed bandwidth rather then from 0.

I think FreeBSD does this.

>     Another way would be to keep
>     track of the number of bandwidth calculations that have occured and
>     instead of averaging 1/16 in on each iteration the first few samples
>     would be given a much bigger piece of the pie.
>
>     Here is a patch that implements the second idea.  See if it helps.

I am sorry I have not tested this patch.

Regards,
Noritoshi Demizu





More information about the Kernel mailing list