DP performance

Danial Thom danial_thom at yahoo.com
Wed Nov 30 07:10:49 PST 2005



--- Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx>
wrote:

> 
> :Should we be really that pessimistic about
> potential MP performance, 
> :even with two NICs only?  Typically packet
> flows are bi-directional, 
> :and if we could have one CPU/core taking care
> of one direction, then 
> :there should be at least some room for
> parallelism, especially once the 
> :parallelized routing tables see the light.  Of
> course provided that 
> :each NIC is handled by a separate core, and
> that IPC doesn't become the 
> :actual bottleneck.
> 
>     The problem is that if you only have two
> interfaces, every incoming
>     packet being routed has to go through both
> interfaces, which means
>     that there will be significant memory
> contention between the two cpus
>     no matter what you do.  This won't degrade
> the 2xCPUs by 50%... it's
>     probably more like 20%, but if you only
> have two ethernet interfaces 
>     and the purpose of the box is to route
> packets, there isn't much of a
>     reason to make it an SMP box.  cpu's these
> days are far, far faster then
>     two measily GigE ethernet interface that
> can only do 200 MBytes/sec each.
> 
>     Even more to the point, if you have two
> interfaces you still only have
>     200 MBytes/sec worth of packets to contend
> with, even though each incoming
>     packet is being shoved out the other
> interface (for 400 MBytes/sec of
>     total network traffic).  It is still only
> *one* packet that the cpu is
>     routing.  Even cheap modern cpus can shove
> around several GBytes/sec 
>     without DMA so 200 MBytes/sec is really
> nothing to them.

MBs/sec is not the relevant measure; its pps. Its
the iterations that are the limiting factor,
particularly if you are acting on the packet. The
simplistic analysis of packet in / packet out is
one thing; but the expectation is that *some*
operation is being carried out for each packet,
whether its a firewall check or something even
more intensive. Its pretty rare these days to
have a box that just moves bytes from one
interface to another without some value-added
task. Otherwise you just get a switch and you're
not using a unix-like box.

> 
> :>     Main memory bandwidth used to be an
> issue but isn't so much any
> :> more.
> :
> :The memory bandwidth isn't but latency _is_
> now the major performance 
> :bottleneck, IMO.  DRAM access latencies are
> now in 50 ns range and will 
> :not noticeably decrease in the forseeable
> future.  Consider the amount 
> :of independent memory accesses that need to be
> performed on per-packet 
> :...
> :Cheers
> :
> :Marko
> 
>     No, this is irrelevant.  All modern
> ethernet devices (for the last decade
>     or more) have DMA engines and fairly
> significant FIFOs, which means that
>     nearly all memory accesses are going to be
> burst accesses capable of
>     getting fairly close to the maximum burst
> bandwidth of the memory.  I
>     can't say for sure that this is actually
> happening without a putting
>     a logic analyzer on the memory bus, but I'm
> fairly sure it is.  I seem
>     to recall that the PCI (PCIx, PCIe, etc)
> bus DMA protocols are all burst
>     capable protocols.

Typically only 64 bytes are "burst" at a time max
(if there are no other requests), so you're not
always bursting the entire frame. As the bus
becomes more saturated, you have shorter and
shorter bursts. With 2 devices you're
"realizable" bus bandwidth is about 2/3 for
monodirectional traffic and 1/2 for bi
directional. This puts PCI-X (~8Mb/s) just on the
edge of being fully capable of full-duplex gigE. 

DT


	
		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com





More information about the Users mailing list