10G network netperf performance (ix)
Sepherosa Ziehau
sepherosa at gmail.com
Fri Mar 14 05:02:25 PDT 2014
On Fri, Mar 14, 2014 at 6:10 PM, Dongsheng Song
<dongsheng.song at gmail.com> wrote:
> On Thu, Mar 13, 2014 at 9:26 PM, Sepherosa Ziehau <sepherosa at gmail.com> wrote:
>>
>> Hi all,
>>
>> Following stats are for folks interested in DragonFly's TCP netperf
>> performance on 10G network (as of 9f1b012):
>>
>> Testing system hardware:
>> Host: i7-3770 w/ hyperthreading enabled, dual channel DDR3-1600 memory (8GB x 2)
>> NIC: Intel 82599ES (connected w/ Intel XDACBL1M direct attach cable)
>>
>> TSO burst size is default to 12000B for DragonFly's ix.
>>
>> +-------+ +-------+
>> | | | |
>> | | ix0 ---- ix0 | |
>> | A | | B |
>> | | ix1 ---- ix1 | |
>> | | | |
>> +-------+ +-------+
>>
>> B runs 'netserver -N'
>>
>> 1) TCP_STREAM (total 18840Mbps, 2 ports, 5 run average):
>>
>> tcp_stream -H B0 -i 64 -l 60 &
>> tcp_stream -H B1 -i 64 -l 60
>>
>> The above commands starts 128 netperf TCP_STREAM tests to B0 and B1.
>>
>> The results:
>> ~9424Mbps on for each set of test, i.e. total 18840Mbps (5 run average).
>> Jain's fairness index for each set of test > 0.85 (1.0 is the best).
>>
>> CPU usage statistics:
>> On TX side (A): ~25% sys, ~2% user, ~7% intr. Almost no contention.
>> On RX side (B): ~35% sys, ~3% user, ~10% intr. Mainly contended on rcvtok.
>> Interrupt rate is ~16000 on each CPU (interrupt moderation is
>> default to 8000hz for DragonFly's ix)
>>
>> 2) TCP_STREAM + TCP_MAERTS (total 37279Mbps, 2 ports, 5 run average):
>>
>> tcp_stream -H B0 -i 32 -l 60 &
>> tcp_stream -H B1 -i 32 -l 60 &
>> tcp_stream -H B0 -i 32 -l 60 -r &
>> tcp_stream -H B1 -i 32 -l 60 -r
>>
>> The above commands starts 64 netperf TCP_STREAM and 64 TCP_MAERTS
>> tests to B0 and B1.
>>
>> The results:
>> ~9220Mbps - ~9400Mbps for each set of test, i.e. total 37279Mbps (5
>> runs average)
>> Jain's fairness index for each set of test > 0.80 (1.0 is the best).
>>
>> CPU usage statistics:
>> ~75% sys, ~4% user, ~20% intr. Mainly contended on rcvtok. The
>> tests are CPU limited. System is still responsive during the test.
>> Interrupt rate is ~16000 on each CPU (interrupt moderation is
>> default to 8000hz for DragonFly's ix)
>>
>> Best Regards,
>> sephe
>>
>> --
>> Tomorrow Will Never Die
>
> Thanks, could you post TCP_RR data ?
I am not sure whether TCP_RR is really useful, since each process is
working on one socket. However, I have some statistics for
tools/tools/netrate/accept_connect/kq_connect_client. It is doing
273Kconns/s (tcp connections, 8 processes, each tries to create 128
connections). The server side is
tools/tools/netrate/accept_connect/kq_accept_server (run w/ -r, i.e.
SO_REUSEPORT). MSL is set to 10ms for the testing network and
net.inet.ip.portrange.last is set to 40000.
When doing 273Kconns/s, client side consumes 100% cpu (system is still
responsive though), mainly contended on tcp_port_token (350K
contentions/s on each CPU). Server side has ~45% idle time on each
CPU; contention is pretty low, mainly ip_id spinlock.
The tcp_port_token contention is one of the major causes that we can't
push 335Kconns/s by _one_ client. Another cause is computational cost
of software toeplitz on client side. On server side, toeplitz hash is
calculated by hardware. I am currently working on reducing
tcp_port_token contention.
Best Regards,
sephe
--
Tomorrow Will Never Die
More information about the Users
mailing list