DP performance

Tue Nov 29 10:43:06 PST 2005

:Should we be really that pessimistic about potential MP performance, 
:even with two NICs only?  Typically packet flows are bi-directional, 
:and if we could have one CPU/core taking care of one direction, then 
:there should be at least some room for parallelism, especially once the 
:parallelized routing tables see the light.  Of course provided that 
:each NIC is handled by a separate core, and that IPC doesn't become the 
:actual bottleneck.

    The problem is that if you only have two interfaces, every incoming
    packet being routed has to go through both interfaces, which means
    that there will be significant memory contention between the two cpus
    no matter what you do.  This won't degrade the 2xCPUs by 50%... it's
    probably more like 20%, but if you only have two ethernet interfaces 
    and the purpose of the box is to route packets, there isn't much of a
    reason to make it an SMP box.  cpu's these days are far, far faster then
    two measily GigE ethernet interface that can only do 200 MBytes/sec each.

    Even more to the point, if you have two interfaces you still only have
    200 MBytes/sec worth of packets to contend with, even though each incoming
    packet is being shoved out the other interface (for 400 MBytes/sec of
    total network traffic).  It is still only *one* packet that the cpu is
    routing.  Even cheap modern cpus can shove around several GBytes/sec 
    without DMA so 200 MBytes/sec is really nothing to them.

:>     Main memory bandwidth used to be an issue but isn't so much any
:> more.
:
:The memory bandwidth isn't but latency _is_ now the major performance 
:bottleneck, IMO.  DRAM access latencies are now in 50 ns range and will 
:not noticeably decrease in the forseeable future.  Consider the amount 
:of independent memory accesses that need to be performed on per-packet 
:...
:Cheers
:
:Marko

    No, this is irrelevant.  All modern ethernet devices (for the last decade
    or more) have DMA engines and fairly significant FIFOs, which means that
    nearly all memory accesses are going to be burst accesses capable of
    getting fairly close to the maximum burst bandwidth of the memory.  I
    can't say for sure that this is actually happening without a putting
    a logic analyzer on the memory bus, but I'm fairly sure it is.  I seem
    to recall that the PCI (PCIx, PCIe, etc) bus DMA protocols are all burst
    capable protocols.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>