em driver - issue #2

Mon Feb 7 10:03:04 PST 2005

In a message dated 2/7/2005 9:07:23 AM Eastern Standard Time, Joerg 
Sonnenberger <joerg at xxxxxxxxxxxxxxxxx> writes:

>On Sun, Feb 06, 2005 at 03:04:38PM -0500, EM1897 at xxxxxxx wrote:
>> I think there are a couple of things wrong with that solution.
>> First, controllers know what to do with empty descriptors, in that they 
fall 
>> into a RNR condition. Thats part of the basic design. Its the drivers 
>> responsibility to clear up such conditions. At 145Kpps, you're not going 
to achieve 
>> much by trying to fool the driver into thinking that it has memory, except 
>> losing a lot of packets. The point of the RNR condition is to get the 
other end to 
>> stop sending until you can handle it. The driver is doing something wrong 
in 
>> this case, and it needs to be cleaned up properly.
>
>With 145 Kpps, the interrupt load is too high for the system to do anything
>else. You can try DEVICE_POLLING, which allows the card to use the normal
>dropping mechanism for RX overflow. I don't think this is a fault of 
DragonFly,
>You could get the same situation with multiple network cards at the same
>package rate in other systems.
>
>Joerg
>

After reading this I realized that you are right about the reason that the
memory fails is that the box is interrupt bound (which is just what I was
 trying to achieve when I started this test). I didnt choose 145Kpps by 
accident; I was trying to find a point at which the machine would livelock, 
to compare it to freebsd (since top wasn't working). Usually I fire about 
30Kpps (which is typical load on a busy 100Mb/s network) and see what 
pct of system resources is being used to index the performance of the box. 
145K would be more than this particular box can handle. A faster box can 
easily FORWARD 300K pps, so its not the raw number, but the box's 
capability. I hadn't considered that I'm working with a 32bit bus on this
system.

Lowering the test to 95Kpps, dragonfly  handled it without any problems. So 
I'd say that the failure to get mbuf clusters is a function of the system 
being
perpetually overloaded. However the elegance in which a system handles an
overload condition is important. The fact that the em driver doesn't recover 
normally is the issue now. You can't have a spurt of packets bringing down
the system. 

Joerge, I don't see where the em driver checks for descriptors that don't have
buffers allocated, as would happen when one couldn't be replenished at receive
time. Is it completely broken in terms of handling this situation?