DP performance

Thu Dec 1 01:45:28 PST 2005

Marko Zec wrote:

> On Wednesday 30 November 2005 16:18, Danial Thom wrote:
>> --- Hiten Pandya <hmp at xxxxxxxxxxxxx> wrote:
>> > Marko Zec wrote:
>> > > Should we be really that pessimistic about
>> > > potential MP performance,
>> > > even with two NICs only?  Typically packet
>> > > flows are bi-directional,
>> > > and if we could have one CPU/core taking care
>> > > of one direction, then
>> > > there should be at least some room for
>> > > parallelism, especially once the
>> > > parallelized routing tables see the light.
>> > > Of course provided that
>> > > each NIC is handled by a separate core, and
>> > > that IPC doesn't become the
>> > > actual bottleneck.
>> >
>> > On a similar note, it is important that we add
>> > the *hardware* support
>> > for binding a set of CPUs to particular
>> > interrupt lines.  I believe that
>> > the API support for CPU-affinitized interrupt
>> > threads is already there
>> > so only the hard work is left of converting the
>> > APIC code from physical
>> > to logical access mode.
>> >
>> > I am not sure how the AMD64 platform handles
>> > CPU affinity, by that I
>> > mean if the same infrastructure put in place
>> > for i386 would work or not
>> > with a few modifications here and there.  The
>> > recent untangling of the
>> > interrupt code should make it simpler for
>> > others to dig into adding
>> > interrupt affinity support.
>>
>> This, by itself, it not enough, albeit useful.
>> What you need to do is separate transmit and
>> receive (which use the same interrupts, of
>> course). The only way to increase capacity for a
>> single stream with MP is to separate tx and rx.
> 
> Unless doing fancy oubound queuing, which typically doesn't make much
> sense at 1Gbit/s speeds and above, I'd bet that significantly more CPU
> cycles are spent in the "RX" part than in the "TX", which basically
> only has to enqueue a packet into the devices' DMA ring, and recycle
> already transmitted mbufs.  The other issue with having separate CPUs
> handling RX and TX parts of the same interface would be the locking
> mess -> you would end up with the per-data-structure locking model of
> FreeBSD 5.0 and later, which DragonFly diverted from.

And what about using CPUs to both RX and TX? That is, bound a packet to a
CPU to both RX and TX?

Cheers
-- 
Alfredo Beaumont. GPG: http://aintel.bi.ehu.es/~jtbbesaa/jtbbesaa.gpg.asc
Elektronika eta Telekomunikazioak Saila (Ingeniaritza Telematikoa)
Euskal Herriko Unibertsitatea, Bilbao (Basque Country). http://www.ehu.es