UDP inpcbs and more (was Re: tcpcb etc)

Fri Jun 6 12:22:18 PDT 2008

On Friday 06 June 2008, Aggelos Economopoulos wrote:
> On Monday 05 May 2008, Aggelos Economopoulos wrote:
[...]
> > On second thought, let me ask for input sooner rather than later.
[...]
> OK, same thing, but now it's the pcbs. TCP is "easy".
[...]
> My plan is to start a discussion on the more interesting in_pcb 
> situation on kernel@ this weekend.

Currently, inpcb's for UDP are all in a global hash table which is
only protected by (you guessed it) the BGL. The straightforward
way to avoid this would be to break the table a la TCP. This presents
two problems.

First, UDP sockets can issue connect() multiple times (thus changing
faddr, fport) and call bind to go from wildcard laddr to a specific one.
When this happens, the inpcb must be moved to another cpu. This shouldn't
be too hard to handle; just mark the old inpcb as BEING_DELETED and only
delete it after inserting the new inpcb. Dropping a few packets is expected
for UDP and this shouldn't happen very often anyway.

Then there is the more interesting issue of how to hash. As described above,
the lport is the only field we can be sure is not a wildcard. Now consider
a UDP (say DNS) server; such a server does not normally connect() so whatever
hash function we choose, the inpcb is going to end up on one cpu. This is the
cpu we would normally dispatch an incoming UDP packet to. The thing is, all
datagrams for our UDP server will end up going through the same cpu. So our
busy DNS server just can't scale: using only one protocol thread is going to
be a bottleneck.

And if we decide to allow multiple UDP protocol threads to access the socket
then we may have to lock around accesses to the socket, but AFAICT that
won't be necessary. UDP does not mess with most socket fields in the
input/output paths and it seems to me the code can survive some socket
option changing under it. However, our sockbuf can't handle concurrent
accesses, so we'd have to have multiple sockbufs (one per cpu) and then
the socket layer would have to pull data from all of them (probably in a
round-robin fashion). UDP does not guarantee in-order delivery but, since
in-order is typically the case, I'm not sure how well the apps can handle
it. On top of that we'd need to decide what to do about buffer size limits
and whether the sockbufs should stay in struct socket.

OK, this should get the discussion started :)

Aggelos