UDP inpcbs and more (was Re: tcpcb etc)

Sepherosa Ziehau sepherosa at gmail.com
Fri Jun 6 20:01:52 PDT 2008


On Sat, Jun 7, 2008 at 3:16 AM, Aggelos Economopoulos
<aoiko at cc.ece.ntua.gr> wrote:
> On Friday 06 June 2008, Aggelos Economopoulos wrote:
>> On Monday 05 May 2008, Aggelos Economopoulos wrote:
> [...]
>> > On second thought, let me ask for input sooner rather than later.
> [...]
>> OK, same thing, but now it's the pcbs. TCP is "easy".
> [...]
>> My plan is to start a discussion on the more interesting in_pcb
>> situation on kernel@ this weekend.
>
> Currently, inpcb's for UDP are all in a global hash table which is
> only protected by (you guessed it) the BGL. The straightforward
> way to avoid this would be to break the table a la TCP. This presents
> two problems.
>
> First, UDP sockets can issue connect() multiple times (thus changing
> faddr, fport) and call bind to go from wildcard laddr to a specific one.
> When this happens, the inpcb must be moved to another cpu. This shouldn't
> be too hard to handle; just mark the old inpcb as BEING_DELETED and only
> delete it after inserting the new inpcb. Dropping a few packets is expected
> for UDP and this shouldn't happen very often anyway.

Followings are my vague thoughts:
Add per-CPU state in UDP inpcb.  There are something I think the
per-CPU state should have: input sockbuf, addr/port pair, valid bit.
Valid bit is on if the UDP socket is not connected, or the UDP socket
is connected and {lf}{port,addr} hash value equals to the state's
owner CPU.  Mainly for input packet validation.
Hash should be quite straightforward for connected UDP socket.  For
unconnected UDP socket, f{addr.port} is from user.  The problem is the
lport upon the first sending and laddr for unbound UDP socket.  laddr
for unbound UDP socket could possibly be handled by having per-CPU
route cache.  We could also let lport determination happens only on
CPU0, this is assumed to be time consuming, but under worst case it
could only happens for the first several packets in the life time of
the socket.
There is no need to add output sockbuf in per-CPU state, and input
sockbuf does not need any lock.  We simply skip so's input sockbuf.

>
> Then there is the more interesting issue of how to hash. As described above,
> the lport is the only field we can be sure is not a wildcard. Now consider
> a UDP (say DNS) server; such a server does not normally connect() so whatever
> hash function we choose, the inpcb is going to end up on one cpu. This is the
> cpu we would normally dispatch an incoming UDP packet to. The thing is, all
> datagrams for our UDP server will end up going through the same cpu. So our
> busy DNS server just can't scale: using only one protocol thread is going to
> be a bottleneck.

This belongs {wildcard,bound} laddr and bound lport, use th per-CPU
UDP states I mentioned above, then in a normal use case, work load
should be able to be distributed.

Best Regards,
sephe

-- 
Live Free or Die





More information about the Kernel mailing list