UDP inpcbs and more (was Re: tcpcb etc)

Sun Jun 8 07:37:54 PDT 2008

On Friday 06 June 2008, Matthew Dillon wrote:
> :First, UDP sockets can issue connect() multiple times (thus changing
> :faddr, fport) and call bind to go from wildcard laddr to a specific one.
> :When this happens, the inpcb must be moved to another cpu. This shouldn't
> :be too hard to handle; just mark the old inpcb as BEING_DELETED and only
> :delete it after inserting the new inpcb. Dropping a few packets is expected
> :for UDP and this shouldn't happen very often anyway.
> :....
> :
> :Then there is the more interesting issue of how to hash. As described above,
> :the lport is the only field we can be sure is not a wildcard. Now consider
> :a UDP (say DNS) server; such a server does not normally connect() so whatever
> :hash function we choose, the inpcb is going to end up on one cpu. This is the
> :cpu we would normally dispatch an incoming UDP packet to. The thing is, all
> :datagrams for our UDP server will end up going through the same cpu. So our
> :busy DNS server just can't scale: using only one protocol thread is going to
> :be a bottleneck.
> 
>     My personal opinion is that we should just hash on laddr/lport and not
>     worry about the very few applications that try to demux packets with
>     multiple threads from the same socket.  At least not for now.

Well, DNS is probably the most important protocol built on top of UDP and
since it normally consists of a ping-pong, there would be little need for
synchronisation between threads reading from the same socket. Allowing for
scalable DNS servers seems important to me.

> :option changing under it. However, our sockbuf can't handle concurrent
> :accesses, so we'd have to have multiple sockbufs (one per cpu) and then
> :the socket layer would have to pull data from all of them (probably in a
> :round-robin fashion). UDP does not guarantee in-order delivery but, since
> :in-order is typically the case, I'm not sure how well the apps can handle
> :it. On top of that we'd need to decide what to do about buffer size limits
> :and whether the sockbufs should stay in struct socket.
[...]
>     Our sockbufs need a general SMP solution, I think probably a spinlock
>     may be best due to the concurrency.

That's so boring :) (and AFAICT unnecessary).

>     I'd say we should get it working first, and then worry about optimizing
>     it.

Well, sure, but it helps if we have a vague idea about where we're headed,
don't you think? So I guess the short-term roadmap should be a) do tcpcb
as described on a previous mail b) make the UDP inpcbs per-cpu, hashing on
l{addr,port} c) deal with struct socket accesses d) remove the BGL from
TCP/UDP e) fix bugs f) fix more bugs g) measure h) work on interesting stuff.
Well, not necessarily in that order ;)

Aggelos