Network improvement 4.8 -> 5.0 for short-lived HTTP/1.1 workload.

Tue Oct 17 07:57:11 PDT 2017

In this release cycle, several items are committed to improve
performance and reduce/stablize latency.

30K concurrent connections, 1 request/connection, 1KB web object.
Server have 24 HT.

Baseline (32 nginx workers w/ 16 netisrs):
performance 215907.25tps, latency-avg 33.11ms, latency-stdev 41.76ms,
latency-99% 192.36ms.

The performance for 16 nginx workers is too low to be used as baseline
(16 nginx workers w/ 16 netisrs):
performance 191920.81tps, latency-avg 32.04ms, latency-stdev 25.15ms,
latency-99% 101.37ms.

===================

Make # of netisr tunable.

If # of netisrs is set to ncpus, this allows two optmized settings in nginx:
1) Make # of nginx workers same as # of netisrs.
2) CPU-bind nginx workers.

24 nginx workers w/ 24 netisrs:
performance 212556.02tps, latency-avg 56.18ms, latency-stdev 7.90ms,
latency-99% 70.31ms.

24 nginx workers w/ 24 netisrs, cpu-bound nginx workers:
performance 210658.80tps, latency-avg 58.01ms, latency-stdev 5.20ms,
latency-99% 68.73ms.

As you can see, performance dropped a bit.  Though average latency is
increased, latency is significantly stablized.

===================

Limit the # of acceptable sockets returned by kevent(2).

24 nginx workers w/ 24 netisrs, cpu-bound nginx workers:
performance 217599.01tps, latency-avg 32.00ms, latency-stdev 2.35ms,
latency-99% 35.59ms.

Compared w/ baseline, performance improved a bit and latency is
reduced a bit.  However, latency is significantly stabled.

===================

Summary of the comparison of different web object size:

1KB web object

             | perf (tps) | lat-avg | lat-stdev | lat-99%
-------------+------------+---------+-----------+---------
baseline     |  215907.25 |   33.11 |     41.76 |  192.36
-------------+------------+---------+-----------+---------
netisr_ncpus |  210658.80 |   58.01 |      5.20 |   68.73
-------------+------------+---------+-----------+---------
kevent.data  |  217599.01 |   32.00 |      2.35 |   35.59

8KB web object

             | perf (tps) | lat-avg | lat-stdev | lat-99%
-------------+------------+---------+-----------+---------
baseline     |  182719.03 |   42.62 |     58.70 |  250.51
-------------+------------+---------+-----------+---------
netisr_ncpus |  181201.11 |   68.78 |      6.43 |   80.68
-------------+------------+---------+-----------+---------
kevent.data  |  186324.41 |   37.41 |      4.81 |   48.69

16KB web object

             | perf (tps) | lat-avg | lat-stdev | lat-99%
-------------+------------+---------+-----------+---------
baseline     |  138625.67 |   72.01 |     65.78 |  304.78
-------------+------------+---------+-----------+---------
netisr_ncpus |  138323.40 |   93.61 |     16.30 |  137.12
-------------+------------+---------+-----------+---------
kevent.data  |  138778.11 |   60.90 |     11.80 |   92.07

So performance is improved a bit, latency-avg is reduced by 3%~15%,
latency-stdev is reduced by 82%~94%, latency-99% is reduced by
69%~81%!

+++++++++++++++

And as a bonus, forwarding performance is also improved!  We now can
do 13.2Mpps (dual direction forwarding, output packets count) w/
fastforwarding, and 11Mpps w/ normal forwarding.

Thanks,
sephe

-- 
Tomorrow Will Never Die