altq spinlock and if_start dispatch

Matthew Dillon dillon at apollo.backplane.com
Sat Apr 12 10:32:10 PDT 2008


:Hi all,
:
:In an experiment I conducted ~one weeks ago, I found out that ifnet
:serializer contention (if_output and NIC's txeof) had negative effect
:on network forwarding performance, so I created the following patch:
:http://leaf.dragonflybsd.org/~sephe/ifq_spinlock.diff3
:...
:
:I considered dispatching outgoing mbuf to NIC's interrupt/polling CPU
:to do the enqueue and if_start thus to avoid the spinlock, but upper
:layer, like TCP, processes ifq_handoff error (e.g. ENOBUFS);
:dispatching outgoing mbuf will break original semantics.  However, the
:only source of error in ifq_handoff() is from ifq_enqueue(), i.e. only
:ifq_enqueue() must be called directly on the output path.  Spinlock is
:chosen to protect ifnet.if_snd, so ifq_enqueue() and ifnet.if_start()
:could be departed. '1)' has one implication that
:ifq_poll->driver_encap->ifq_dequeue does not work, but I think driver
:could be easily converted to do ifq_dequeue+driver_encap without
:packet lossing.  '1)' is the precondition to make '2)' and '3)' work.
:'2)' and '3)' together could avoid ifnet serializer contention.  '4)'
:is added, since my another experiment shows that serious ipiq overflow
:could have very bad impact on overall performance.
:
:I have gathered following data before and after the patch:
:http://leaf.dragonflybsd.org/~sephe/data.txt
:http://leaf.dragonflybsd.org/~sephe/data_patched.txt
:
:boxa --+----> em0 -Routing box- em1 ----> msk0
:boxb --+
:
:boxa (2 x PIII 1G) and boxb (AthlonXP 2100+) each has one 32bit PCI
:...
:
:Fast forwarding is improved a lot in BGL case, probably because the
:time consumed on input path is greatly reduced by if_start
:dispatching.
:This patch does introduce regression in MP safe case when em0/em1 is
:on different CPU than the packet's target CPU on Routing box, may be
:caused by ipi bouncing between two CPUs (I didn't find the source of
:the problem yet).
:Fast forwarding perf drops a lot in MP safe case, if em0 and em1 are
:on different CPUs; reducing serializer contention does help (~40Kpps
:improvement).  Something still needs to be figured out.
:
:So please review the patch.  It is not finished yet, but the major
:part has been done and I want to call for reviewing before I strand
:too far away.
:
:Best Regards,
:sephe
:
:-- 
:Live Free or Die

    Ok, I think one of the biggest issues is the MP lock.  I'm assuming
    you were running with kern.intr_mpsafe set to 1 for these tests? 

    The way to determine whether the MP lock is creating a problem or
    not is to use the KTR logging facility and log all lock contention
    (and maybe even add additional KTR facilities to track exactly what
    the packet path is doing.

    None of the tests are going to give us real results unless the MP
    lock is completely out of the test path.  That should be the #1
    issue before we start splitting the serializer locks.   I suspect
    some of your IPI bouncing issues may be due to the MP lock.
    
:The ideas behind this patch are:
:1) altq is protected by its own spinlock instead of ifnet's serializer.

    I like the idea.

:2) ifnet's serializer is pushed down into each ifnet.if_output
:implementation, i.e. if_output is called without ifnet's serializer
:being held

    I agree that having just one serializer for a network device driver
    is not enough.  Ultimately I think we do want two or three
    serializers:  One for the interrupt (RX packet handling and TX
    packet cleanup after xmit has completed), one for transmit packet
    queueing, and possibly one for the control path (ifconfig etc).
    This last one doesn't really need to exist, the kernel could
    instead acquire both of the other two serializers when doing control
    operations.

    So, yes, the TX packet queue to interface mechanic would definitely
    have its own serializer.

    But I think the only way to do this cleanly is to have the callers
    into the API acquire the appropriate serializers instead of forcing
    each individual network driver to acquire them.  Remember how easy it
    was for me to implement the original serializer plan?  I want things
    to continue to be that easy.

:3) ifnet.if_start is dispatched to the CPU where NIC's interrupt is
:handled or where polling(4) is going to happen

    This shouldn't be necessary.  This sounds like an issue that the MP
    lock might have been effecting.  Acquiring the TX packet interface
    enqueue serializer ought to be enough and then simply directly call
    the entry point (and also using your if_start trick below).

:4) ifnet.if_start's ipi sending is avoided as much as possible, using
:the same mechanism as avoiding real ipi sending in ipiq implementation

    I like this part of the patch.  Definitely a good idea.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Kernel mailing list