IP forwarding performance (git.2aa7f7f, normal 4.20Mpps, fast 5.07Mpps)

Thu Dec 27 00:42:58 PST 2012

Hi all,

Before I move on to the next big ticket (multiple-tx queue support),
here is the performance I currently got as of git 2aa7f7f.

Quick summary, the IFQ packets staging mechanism gives me:
+80Kpps for 2 bidirectional normal IP forwarding (now 4.20Mpps)
+30Kpps for 2 bidirectional fast forwarding (now 5.07Mpps)

Detailed information, please read the following inline comment.

On Thu, Dec 20, 2012 at 3:03 PM, Sepherosa Ziehau <sepherosa at gmail.com> wrote:
> On Fri, Dec 14, 2012 at 5:47 PM, Sepherosa Ziehau <sepherosa at gmail.com> wrote:
>> Hi all,
>>
>> This email serves as the base performance measurement for further
>> network stack optimization (as of git 107282b).
>
> Since bidirectional fast IP forwarding is already max out the GigE
> limit, I increase the measurement strength a bit.  The new measurement
> is against git 7e1fbcf
>
>>
>>
>> The hardware:
>> mobo ASUS P867H-M
>> 4x4G DDR3 memory
>> CPU i7-2600 (w/ HT and Turbo Boost enabled, 4C/8T)
>> Forwarding NIC Intel 82576EB dual copper
>
> The forwarding NIC is now changed to 82580EB quad copper.
>
>> Packet generator NICs Intel 82571EB dual copper
>>
>>
>> A emx1 <---> igb0 forwarder igb1 <---> emx1 B
>
> The testing topology is changed into following configure:
> +---+                 +-----------+                 +---+
> |   | emx1 <---> igb0 |           | igb1 <---> emx1 |   |
> | A |                 | forwarder |                 | B |
> |   | emx2 <---> igb2 |           | igb3 <---> emx2 |   |
> +---+                 +-----------+                 +---+
>
> Streams:
> A.emx1 <---> B.emx1 (bidirectional)
> A.emx2 <---> B.emx2 (bidirectional)
>
>>
>> A and "forwarder", B and "forwarder" are directly connected using CAT6 cables.
>> Polling(4) is enabled on igb1 and igb0 on "forwarder".  Following
>> tunables are in /boot/loader.conf:
>> kern.ipc.nmbclusters="524288"
>> net.ifpoll.user_frac="10"
>> net.ifpoll.status_frac="1000"

net.link.ifq_stage_cntmax="8"

>> Following sysctl is changed before putting igb1 into polling mode:
>> sysctl hw.igb1.npoll_txoff=4
>
> sysctl hw.igb1.npoll_txoff=1
> sysctl hw.igb2.npoll_txoff=2
> sysctl hw.igb3.npoll_txoff=3

sysctl hw.igb0.tx_wreg_nsegs=16
sysctl hw.igb1.tx_wreg_nsegs=16
sysctl hw.igb2.tx_wreg_nsegs=16
sysctl hw.igb3.tx_wreg_nsegs=16

>
>>
>>
>> First for the users that are only interested in the bulk forwarding
>> performance:  The 32 netperf TCP_STREAMs running on A could do
>> 941Mbps.
>>
>>
>> Now the tiny packets forwarding performance:
>>
>> A and B generate 18 bytes UDP datagrams using
>> tools/tools/netrate/pktgen.  The destination addresses of the UDP
>> datagrams are selected that the generated UDP datagrams will be evenly
>> distributed the to the 8 RX queues, which should be common in the
>> production environment.
>>
>> Bidirectional normal IP forwarding:
>> 1.42Mpps in each direction, so total 2.84Mpps are forwarded.
>> CPU usage:
>> On CPUs that are doing TX in addition to RX: 85% ~ 90% (max allowed by
>> polling's user_frac)
>> On CPUs that are only doing RX: 40% ~ 50%
>
> Two sets of bidirectional normal IP forwarding:
> 1.03Mpps in each direction, so total 4.12Mpps are forwarded.

1.05+Mpps in each direction, so total 4.20Mpps are forwarded.

> CPU usage:
> On CPUs that are doing TX in addition to RX: 90% (max allowed by
> polling's user_frac)
> On CPUs that are only doing RX: 70% ~ 80%

Not much improvement on CPU usage.

> IPI rate on CPUs that are doing TX in addition to RX: ~10K/s

IPI rate on CPUs that are doing TX in addition to RX: ~4.5K/s

>
>>
>> Bidirectional fast IP forwarding: (net.inet.ip.fastforwarding=1)
>> 1.48Mpps in each direction, so total 2.96Mpps are forwarded.
>> CPU usage:
>> On CPUs that are doing TX in addition to RX: 65% ~ 70%
>> On CPUs that are doing RX: 30% ~ 40%
>
> Two sets of bidirectional fast IP forwarding: (net.inet.ip.fastforwarding=1)
> 1.26Mpps in each direction, so total 5.04Mpps are forwarded.

~1.27Mpps in each direction, so total 5.07Mpps are forwarded.

> CPU usage:
> On CPUs that are doing TX in addition to RX: 90% (max allowed by
> polling's user_frac)
> On CPUs that are only doing RX: 60% ~ 70%

Not much improvement on CPU usage.

> IPI rate on CPUs that are doing TX in addition to RX: ~10K/s

IPI rate on CPUs that are doing TX in addition to RX: ~5K/s

Best Regards,
sephe

--
Tomorrow Will Never Die