UDP inpcbs and more (was Re: tcpcb etc)

Aggelos Economopoulos aoiko at cc.ece.ntua.gr
Sun Jun 8 07:25:40 PDT 2008


On Saturday 07 June 2008, Sepherosa Ziehau wrote:
> On Sat, Jun 7, 2008 at 3:16 AM, Aggelos Economopoulos
> <aoiko at cc.ece.ntua.gr> wrote:
> > On Friday 06 June 2008, Aggelos Economopoulos wrote:
> >> On Monday 05 May 2008, Aggelos Economopoulos wrote:
> > [...]
> >> > On second thought, let me ask for input sooner rather than later.
> > [...]
> >> OK, same thing, but now it's the pcbs. TCP is "easy".
> > [...]
> >> My plan is to start a discussion on the more interesting in_pcb
> >> situation on kernel@ this weekend.
> >
> > Currently, inpcb's for UDP are all in a global hash table which is
> > only protected by (you guessed it) the BGL. The straightforward
> > way to avoid this would be to break the table a la TCP. This presents
> > two problems.
> >
> > First, UDP sockets can issue connect() multiple times (thus changing
> > faddr, fport) and call bind to go from wildcard laddr to a specific one.
> > When this happens, the inpcb must be moved to another cpu. This shouldn't
> > be too hard to handle; just mark the old inpcb as BEING_DELETED and only
> > delete it after inserting the new inpcb. Dropping a few packets is expected
> > for UDP and this shouldn't happen very often anyway.
> 
> Followings are my vague thoughts:
> Add per-CPU state in UDP inpcb.  There are something I think the
> per-CPU state should have: input sockbuf, addr/port pair, valid bit.
> Valid bit is on if the UDP socket is not connected, or the UDP socket
> is connected and {lf}{port,addr} hash value equals to the state's
> owner CPU.  Mainly for input packet validation.

That would be one way to do it, especially if we want to go in the per-cpu
sockbuf direction.

> Hash should be quite straightforward for connected UDP socket.  For
> unconnected UDP socket, f{addr.port} is from user.  The problem is the
> lport upon the first sending and laddr for unbound UDP socket.

Hrmph. Indeed.

> laddr for unbound UDP socket could possibly be handled by having per-CPU
> route cache.

This sounds doable.

> We could also let lport determination happens only on 
> CPU0, this is assumed to be time consuming, but under worst case it
> could only happens for the first several packets in the life time of
> the socket.

You mean sending an IPI? That would add significant latency to UDP
ping-pongs which would affect DNS queries. I think it would be measurable.

Perhaps we can again partition the port space per cpu and only use
IPIs when local ports run out. Doesn't sound too elegant :(

> There is no need to add output sockbuf in per-CPU state, and input
> sockbuf does not need any lock.  We simply skip so's input sockbuf.
> 
> >
> > Then there is the more interesting issue of how to hash. As described above,
> > the lport is the only field we can be sure is not a wildcard. Now consider
> > a UDP (say DNS) server; such a server does not normally connect() so whatever
> > hash function we choose, the inpcb is going to end up on one cpu. This is the
> > cpu we would normally dispatch an incoming UDP packet to. The thing is, all
> > datagrams for our UDP server will end up going through the same cpu. So our
> > busy DNS server just can't scale: using only one protocol thread is going to
> > be a bottleneck.
> 
> This belongs {wildcard,bound} laddr and bound lport, use th per-CPU
> UDP states I mentioned above, then in a normal use case, work load
> should be able to be distributed.

Right.

Thanks for your insight,
Aggelos





More information about the Kernel mailing list