UDP inpcbs and more (was Re: tcpcb etc)
Aggelos Economopoulos
aoiko at cc.ece.ntua.gr
Sun Jun 8 07:25:40 PDT 2008
On Saturday 07 June 2008, Sepherosa Ziehau wrote:
> On Sat, Jun 7, 2008 at 3:16 AM, Aggelos Economopoulos
> <aoiko at cc.ece.ntua.gr> wrote:
> > On Friday 06 June 2008, Aggelos Economopoulos wrote:
> >> On Monday 05 May 2008, Aggelos Economopoulos wrote:
> > [...]
> >> > On second thought, let me ask for input sooner rather than later.
> > [...]
> >> OK, same thing, but now it's the pcbs. TCP is "easy".
> > [...]
> >> My plan is to start a discussion on the more interesting in_pcb
> >> situation on kernel@ this weekend.
> >
> > Currently, inpcb's for UDP are all in a global hash table which is
> > only protected by (you guessed it) the BGL. The straightforward
> > way to avoid this would be to break the table a la TCP. This presents
> > two problems.
> >
> > First, UDP sockets can issue connect() multiple times (thus changing
> > faddr, fport) and call bind to go from wildcard laddr to a specific one.
> > When this happens, the inpcb must be moved to another cpu. This shouldn't
> > be too hard to handle; just mark the old inpcb as BEING_DELETED and only
> > delete it after inserting the new inpcb. Dropping a few packets is expected
> > for UDP and this shouldn't happen very often anyway.
>
> Followings are my vague thoughts:
> Add per-CPU state in UDP inpcb. There are something I think the
> per-CPU state should have: input sockbuf, addr/port pair, valid bit.
> Valid bit is on if the UDP socket is not connected, or the UDP socket
> is connected and {lf}{port,addr} hash value equals to the state's
> owner CPU. Mainly for input packet validation.
That would be one way to do it, especially if we want to go in the per-cpu
sockbuf direction.
> Hash should be quite straightforward for connected UDP socket. For
> unconnected UDP socket, f{addr.port} is from user. The problem is the
> lport upon the first sending and laddr for unbound UDP socket.
Hrmph. Indeed.
> laddr for unbound UDP socket could possibly be handled by having per-CPU
> route cache.
This sounds doable.
> We could also let lport determination happens only on
> CPU0, this is assumed to be time consuming, but under worst case it
> could only happens for the first several packets in the life time of
> the socket.
You mean sending an IPI? That would add significant latency to UDP
ping-pongs which would affect DNS queries. I think it would be measurable.
Perhaps we can again partition the port space per cpu and only use
IPIs when local ports run out. Doesn't sound too elegant :(
> There is no need to add output sockbuf in per-CPU state, and input
> sockbuf does not need any lock. We simply skip so's input sockbuf.
>
> >
> > Then there is the more interesting issue of how to hash. As described above,
> > the lport is the only field we can be sure is not a wildcard. Now consider
> > a UDP (say DNS) server; such a server does not normally connect() so whatever
> > hash function we choose, the inpcb is going to end up on one cpu. This is the
> > cpu we would normally dispatch an incoming UDP packet to. The thing is, all
> > datagrams for our UDP server will end up going through the same cpu. So our
> > busy DNS server just can't scale: using only one protocol thread is going to
> > be a bottleneck.
>
> This belongs {wildcard,bound} laddr and bound lport, use th per-CPU
> UDP states I mentioned above, then in a normal use case, work load
> should be able to be distributed.
Right.
Thanks for your insight,
Aggelos
More information about the Kernel
mailing list