10G network netperf performance (ix)

Sepherosa Ziehau sepherosa at gmail.com
Fri Mar 14 05:02:25 PDT 2014


On Fri, Mar 14, 2014 at 6:10 PM, Dongsheng Song
<dongsheng.song at gmail.com> wrote:
> On Thu, Mar 13, 2014 at 9:26 PM, Sepherosa Ziehau <sepherosa at gmail.com> wrote:
>>
>> Hi all,
>>
>> Following stats are for folks interested in DragonFly's TCP netperf
>> performance on 10G network (as of 9f1b012):
>>
>> Testing system hardware:
>> Host: i7-3770 w/ hyperthreading enabled, dual channel DDR3-1600 memory (8GB x 2)
>> NIC: Intel 82599ES (connected w/ Intel XDACBL1M direct attach cable)
>>
>> TSO burst size is default to 12000B for DragonFly's ix.
>>
>> +-------+              +-------+
>> |       |              |       |
>> |       | ix0 ---- ix0 |       |
>> |   A   |              |   B   |
>> |       | ix1 ---- ix1 |       |
>> |       |              |       |
>> +-------+              +-------+
>>
>> B runs 'netserver -N'
>>
>> 1) TCP_STREAM (total 18840Mbps, 2 ports, 5 run average):
>>
>>    tcp_stream -H B0 -i 64 -l 60 &
>>    tcp_stream -H B1 -i 64 -l 60
>>
>>    The above commands starts 128 netperf TCP_STREAM tests to B0 and B1.
>>
>>    The results:
>>    ~9424Mbps on for each set of test, i.e. total 18840Mbps (5 run average).
>>    Jain's fairness index for each set of test > 0.85 (1.0 is the best).
>>
>>    CPU usage statistics:
>>    On TX side (A): ~25% sys, ~2% user, ~7% intr.  Almost no contention.
>>    On RX side (B): ~35% sys, ~3% user, ~10% intr.  Mainly contended on rcvtok.
>>    Interrupt rate is ~16000 on each CPU (interrupt moderation is
>> default to 8000hz for DragonFly's ix)
>>
>> 2) TCP_STREAM + TCP_MAERTS (total 37279Mbps, 2 ports, 5 run average):
>>
>>    tcp_stream -H B0 -i 32 -l 60 &
>>    tcp_stream -H B1 -i 32 -l 60 &
>>    tcp_stream -H B0 -i 32 -l 60 -r &
>>    tcp_stream -H B1 -i 32 -l 60 -r
>>
>>    The above commands starts 64 netperf TCP_STREAM and 64 TCP_MAERTS
>> tests to B0 and B1.
>>
>>    The results:
>>    ~9220Mbps - ~9400Mbps for each set of test, i.e. total 37279Mbps (5
>> runs average)
>>    Jain's fairness index for each set of test > 0.80 (1.0 is the best).
>>
>>    CPU usage statistics:
>>    ~75% sys, ~4% user, ~20% intr.  Mainly contended on rcvtok.  The
>> tests are CPU limited.  System is still responsive during the test.
>>    Interrupt rate is ~16000 on each CPU (interrupt moderation is
>> default to 8000hz for DragonFly's ix)
>>
>> Best Regards,
>> sephe
>>
>> --
>> Tomorrow Will Never Die
>
> Thanks, could you post TCP_RR data ?

I am not sure whether TCP_RR is really useful, since each process is
working on one socket.  However, I have some statistics for
tools/tools/netrate/accept_connect/kq_connect_client.  It is doing
273Kconns/s (tcp connections, 8 processes, each tries to create 128
connections).  The server side is
tools/tools/netrate/accept_connect/kq_accept_server (run w/ -r, i.e.
SO_REUSEPORT).  MSL is set to 10ms for the testing network and
net.inet.ip.portrange.last is set to 40000.

When doing 273Kconns/s, client side consumes 100% cpu (system is still
responsive though), mainly contended on tcp_port_token (350K
contentions/s on each CPU).  Server side has ~45% idle time on each
CPU; contention is pretty low, mainly ip_id spinlock.

The tcp_port_token contention is one of the major causes that we can't
push 335Kconns/s by _one_ client.  Another cause is computational cost
of software toeplitz on client side.  On server side, toeplitz hash is
calculated by hardware.  I am currently working on reducing
tcp_port_token contention.

Best Regards,
sephe

-- 
Tomorrow Will Never Die



More information about the Users mailing list