git: altq: Implement two level "rough" priority queue for plain sub-queue

Sepherosa Ziehau sephe at
Thu Jun 13 07:04:16 PDT 2013

commit 4cc8caef016d5b3dbf5eb396e27f7eb1e8a6afce
Author: Sepherosa Ziehau <sephe at>
Date:   Sat Jun 8 13:47:43 2013 +0800

    altq: Implement two level "rough" priority queue for plain sub-queue
    The "rough" part comes from two sources:
    - Hardware queue could be deep, normally 512 or more even for GigE
    - Round robin on the transmission queues is used by all of the multiple
      transmission queue capable hardwares supported by DragonFly as of this
    These two sources affect the packet priority set by DragonFly.
    DragonFly's "rough" prority queue has only two level, i.e. high priority
    and normal priority, which should be enough.  Each queue has its own
    header.  The normal priority queue will be dequeue only when there is no
    packets in the high priority queue.  During enqueue, if the sub-queue is
    full and the high priority queue length is less than half of the sub-
    queue length (both packet count and byte count), drop-head will be
    applied on the normal priority queue.
    M_PRIO mbuf flag is added to mark that the mbuf is destined for the high
    priority queue.  Currently TCP uses it to prioritize SYN, SYN|ACK, and
    pure ACK w/o FIN and RST.  This behaviour could be turn off by
    net.inet.tcp.prio_synack, which is on by default.
    The performance improvement!
    The test environment:
    All three boxes are using Intel i7-2600 w/ HT enabled
                              |     |
                    +->- emx1 |  B  | TCP_MAERTS
    +-----+         |         |     |
    |     |         |         +-----+
    |  A  | bnx0 ---+
    |     |         |         +-----+
    +-----+         |         |     |
                    +-<- emx1 |  C  | TCP_STREAM/TCP_RR
                              |     |
    A's kernel has this commit compiled.  bnx0 has all four transmission
    queues enabled.  For bnx0, the hardware's transmission queue round-robin
    is on TSO segment boundry.
    Some base line measurement:
    B<--A TCP_MAERTS (raw stats) (128 client): 984 Mbps
        (tcp_stream -H A -l 15 -i 128 -r)
    C-->A TCP_STREAM (128 client): 942 Mbps (tcp_stream -H A -l 15 -i 128)
    C-->A TCP_CC (768 client): 221199 conns/s (tcp_cc -H A -l 15 -i 768)
    To effectively measure the TCP_CC, the prefix route's MSL is changed to
    10ms: route change -msl 10
    All stats gather in the following measurement are below the base line
    measurement (well, they should be).
    C-->A TCP_CC improvement, during test B<--A TCP_MAERTS is running:
                            TCP_MAERTS(raw)  TCP_CC
    TSO prio_synack=1       948 Mbps         15988 conns/s
    TSO prio_synack=0       965 Mbps          8867 conns/s
    non-TSO prio_synack=1   943 Mbps         18128 conns/s
    non-TSO prio_synack=0   959 Mbps         11371 conns/s
    * 80% TCP_CC performance improvement w/ TSO and 60% w/o TSO!
    C-->A TCP_STREAM improvement, during test B<--A TCP_MAERTS is running:
                            TCP_MAERTS(raw)  TCP_STREAM
    TSO prio_synack=1       969 Mbps         920 Mbps
    TSO prio_synack=0       969 Mbps         865 Mbps
    non-TSO prio_synack=1   969 Mbps         920 Mbps
    non-TSO prio_synack=0   969 Mbps         879 Mbps
    * 6% TCP_STREAM performance improvement w/ TSO and 4% w/o TSO.

Summary of changes:
 sys/net/altq/if_altq.h                         | 26 ++++++-
 sys/net/if.c                                   | 95 ++++++++++++++++++++++----
 sys/netinet/tcp_output.c                       | 11 +++
 sys/netinet/tcp_syncache.c                     |  2 +
 sys/netinet/tcp_var.h                          |  1 +
 sys/netproto/802_11/ieee80211_dragonfly.h      |  9 +--
 sys/netproto/802_11/wlan/ieee80211_dragonfly.c | 31 ++++++++-
 sys/sys/mbuf.h                                 |  3 +-
 8 files changed, 153 insertions(+), 25 deletions(-)

DragonFly BSD source repository

More information about the Commits mailing list