em driver - issue #2

Sun Feb 6 12:05:14 PST 2005

In a message dated 2/6/2005 1:26:04 PM Eastern Standard Time, Matthew Dillon 
<dillon at xxxxxxxxxxxxxxxxxxxx> writes:

>
>:Ok, I have some more info on this that may help to sort it out.
>:
>:I set the m_getcl() in em_get_buf to MB_WAIT and the symptoms are
>:interesting. After about 5 seconds, the mbuf exhausted message
>:appears again. But, it doesn't lock up. For a short time the NIC is 
>:issuing flow controls (as the transmit stream is being held up to a 
>:lower than optimal volume), after awhile the box is able to handle 
>:the full stream normally, showing ~144000 received packets in the 
>:ICMP unreachable message, as is expected.
>:
>:I assume this is some sort of memory cache that gets built up over
>:time. Something that is pre-allocated in FreeBSD but not in Dfly?
>:
>:It appears that the lockup is a bug in the em driver; one that perhaps
>:just doesnt happen often enough for anyone to have gone to the trouble
>:of tracking it down. But its exasperated by the mbuf problem, so if that 
>:can be cleared up it should be "good enough", at least for the time being. 
>
>    My guess is that what is going on is that the EM device is unable
>    to allocate a buffer to the receive ring.  This creates 'holes'
>    in the receive ring.  The EM device's RX interrupt is stopping either
>    when it hits a hole, or if there are no good descriptors left in the
>    receive ring.  The restart only occurs on the next em_poll() or
>    em_intr() which in this case would be a transmit interrupt.
>
>    The best solution is probably to create a 'dummy' mbuf to act as filler
>    when the device is unable to allocate a new one, and then ignore such
>    mbufs (drop the related packets) when they are encountered.
>

I think there are a couple of things wrong with that solution.
First, controllers know what to do with empty descriptors, in that they fall 
into a RNR condition. Thats part of the basic design. Its the drivers 
responsibility to clear up such conditions. At 145Kpps, you're not going to achieve 
much by trying to fool the driver into thinking that it has memory, except 
losing a lot of packets. The point of the RNR condition is to get the other end to 
stop sending until you can handle it. The driver is doing something wrong in 
this case, and it needs to be cleaned up properly.

The second thing thats wrong is that the "problem" is that the memory MUST be 
available. That has to be corrected. Its not acceptable for it to fail the 
way its failing. There's no excuse for a system with 20K clusters supposedly 
allocated to not be able to get the 1600th cluster because of a "bucket" problem. 
The reason that many drivers don't handle the "cant get memory" condition is 
because it almost never happens in real world scenarios. Its a serious problem 
that it happens so quickly. 1000 packets at gigabit speeds is a tiny amount 
of time. It makes little sense to redesign the mbuf system only to leave it 
with such an inefficiency. I don't know enough about it to know how other O/Ss do 
it, but they don't fail the way the dfly does in this instance.