UDP inpcbs and more (was Re: tcpcb etc)
Aggelos Economopoulos
aoiko at cc.ece.ntua.gr
Sun Jun 8 07:37:54 PDT 2008
On Friday 06 June 2008, Matthew Dillon wrote:
> :First, UDP sockets can issue connect() multiple times (thus changing
> :faddr, fport) and call bind to go from wildcard laddr to a specific one.
> :When this happens, the inpcb must be moved to another cpu. This shouldn't
> :be too hard to handle; just mark the old inpcb as BEING_DELETED and only
> :delete it after inserting the new inpcb. Dropping a few packets is expected
> :for UDP and this shouldn't happen very often anyway.
> :....
> :
> :Then there is the more interesting issue of how to hash. As described above,
> :the lport is the only field we can be sure is not a wildcard. Now consider
> :a UDP (say DNS) server; such a server does not normally connect() so whatever
> :hash function we choose, the inpcb is going to end up on one cpu. This is the
> :cpu we would normally dispatch an incoming UDP packet to. The thing is, all
> :datagrams for our UDP server will end up going through the same cpu. So our
> :busy DNS server just can't scale: using only one protocol thread is going to
> :be a bottleneck.
>
> My personal opinion is that we should just hash on laddr/lport and not
> worry about the very few applications that try to demux packets with
> multiple threads from the same socket. At least not for now.
Well, DNS is probably the most important protocol built on top of UDP and
since it normally consists of a ping-pong, there would be little need for
synchronisation between threads reading from the same socket. Allowing for
scalable DNS servers seems important to me.
> :option changing under it. However, our sockbuf can't handle concurrent
> :accesses, so we'd have to have multiple sockbufs (one per cpu) and then
> :the socket layer would have to pull data from all of them (probably in a
> :round-robin fashion). UDP does not guarantee in-order delivery but, since
> :in-order is typically the case, I'm not sure how well the apps can handle
> :it. On top of that we'd need to decide what to do about buffer size limits
> :and whether the sockbufs should stay in struct socket.
[...]
> Our sockbufs need a general SMP solution, I think probably a spinlock
> may be best due to the concurrency.
That's so boring :) (and AFAICT unnecessary).
> I'd say we should get it working first, and then worry about optimizing
> it.
Well, sure, but it helps if we have a vague idea about where we're headed,
don't you think? So I guess the short-term roadmap should be a) do tcpcb
as described on a previous mail b) make the UDP inpcbs per-cpu, hashing on
l{addr,port} c) deal with struct socket accesses d) remove the BGL from
TCP/UDP e) fix bugs f) fix more bugs g) measure h) work on interesting stuff.
Well, not necessarily in that order ;)
Aggelos
More information about the Kernel
mailing list