Delayed ACK triggered by Header Prediction

Wed Mar 16 00:01:32 PST 2005

I am using DragonFlyBSD as a TCP receiver since yersterday night in my
experiences.  And I found that the number of ACK segments sent in reply
to received data segments is less than expected.

Normally, ACK segments are sent for every second full-sized data segment.

  (As many of you know, it is called Delayed ACK and is specified in
   section 4.2.3.2 of RFC1122 as follows:

            A TCP SHOULD implement a delayed ACK, but an ACK should not
            be excessively delayed; in particular, the delay MUST be
            less than 0.5 seconds, and in a stream of full-sized
            segments there SHOULD be an ACK for at least every second
            segment.
  )

But the Header Prediction code in DragonFlyBSD TCP sends ACK segments
less frequently.  It just queues an output request into tcpcbackq[].
And tcp_willblock() processes the request later.  It seems that
tcp_willblock() is called less frequently than receiving two
full-sized data segments in my environment (100Mbps).  (I put printf()'s
in tcp_input(), tcp_output() and tcp_willblock() and found this.)
That would be the reason why the number of ACK segments is less than
expected.

In my experiences, since DragonFlyBSD sends less ACK segments than
expected, the congestion window in the sender machine grows slowly
and the TCP performance becoms poor.

I tried the followings:

  1. "sysctl -w net.inet.tcp.avoid_pure_win_update=0"
     But my problem was not solved.

  2. I replaced the code fragment that inserts an output request in
     Header Prediction with a code that simply calls tcp_output().
     With this change, the TCP performance becomes normal.
     (compared with the performance when a Linux box is a receiver.)

I checked "cvs log".  tcpcbackq[] was introduced on Aug 3, 2004 to
reduce the number of ACK segments across GbE.  Unfortunately, it reduces
the TCP performance on 100Mbps path when DragonFlyBSD acts as a receiver.
I think the same phenomenon will occur when DragonFlyBSD acts as a receiver
across 10GbE.

  What I would like to say here is that when acting as a receiver,
  if the number of ACK segments sent in reply to data segments is reduced,
  TCP performance from peer node would also be reduced because of
  the standard congestion control algorithm.

So, I think it is better to send an ACK segment for every second
full-sized data segment even on GbE.  But I have not experienced
DragonFlyBSD on GbE yet.  So, I may be wrong.  I am sorry in such
case.

Regards,
Noritoshi Demizu