DP performance

Marko Zec zec at icir.org
Wed Nov 30 10:28:00 PST 2005


On Wednesday 30 November 2005 16:18, Danial Thom wrote:
> --- Hiten Pandya <hmp at xxxxxxxxxxxxx> wrote:
> > Marko Zec wrote:
> > > Should we be really that pessimistic about
> > > potential MP performance,
> > > even with two NICs only?  Typically packet
> > > flows are bi-directional,
> > > and if we could have one CPU/core taking care
> > > of one direction, then
> > > there should be at least some room for
> > > parallelism, especially once the
> > > parallelized routing tables see the light.
> > > Of course provided that
> > > each NIC is handled by a separate core, and
> > > that IPC doesn't become the
> > > actual bottleneck.
> >
> > On a similar note, it is important that we add
> > the *hardware* support
> > for binding a set of CPUs to particular
> > interrupt lines.  I believe that
> > the API support for CPU-affinitized interrupt
> > threads is already there
> > so only the hard work is left of converting the
> > APIC code from physical
> > to logical access mode.
> >
> > I am not sure how the AMD64 platform handles
> > CPU affinity, by that I
> > mean if the same infrastructure put in place
> > for i386 would work or not
> > with a few modifications here and there.  The
> > recent untangling of the
> > interrupt code should make it simpler for
> > others to dig into adding
> > interrupt affinity support.
>
> This, by itself, it not enough, albeit useful.
> What you need to do is separate transmit and
> receive (which use the same interrupts, of
> course). The only way to increase capacity for a
> single stream with MP is to separate tx and rx.

Unless doing fancy oubound queuing, which typically doesn't make much 
sense at 1Gbit/s speeds and above, I'd bet that significantly more CPU 
cycles are spent in the "RX" part than in the "TX", which basically 
only has to enqueue a packet into the devices' DMA ring, and recycle 
already transmitted mbufs.  The other issue with having separate CPUs 
handling RX and TX parts of the same interface would be the locking 
mess -> you would end up with the per-data-structure locking model of 
FreeBSD 5.0 and later, which DragonFly diverted from.

Cheers,

Marko


> You'll still have higher latency than UP, but you
> may be able to increase capacity by dedicating
> cycles to processing the receive ring. If you can
> eliminate overruns then you can selectively
> manage transmit.





More information about the Users mailing list