git: udp: Make udp pcbinfo and portinfo per-cpu; greatly improve performance
Sepherosa Ziehau
sephe at crater.dragonflybsd.org
Sun Aug 31 01:08:16 PDT 2014
commit be4519a228f0cdc3d23bcbc147abcf2e7d27f4f7
Author: Sepherosa Ziehau <sephe at dragonflybsd.org>
Date: Thu Jul 3 21:15:27 2014 +0800
udp: Make udp pcbinfo and portinfo per-cpu; greatly improve performance
MAJOR CHANGES:
- Add token to protect pcbinfo's inpcb list and wildcard hash table.
Currently only udp per-cpu pcbinfo sets this token. udp serializer
and netisr barrier are nuked.
o udp inpcb list:
Under most cases, udp inpcb list is operated in its owner netisr.
However, it is also accessed and modified (no effiective udp inpcb
will be unlinked though) in netisr0 to adjust multicast options if
one interface is to be detached. So protecting udp inpcb list
accessing and modification w/ token is necessary.
At udp inpcb detach time, the udp inpcb is first removed from the
udp inpcb list, then a message will go through all netisrs, which
makes sure that no netisrs are using or can find this udp inpcb
from the udp inpcb list. After all these, this udp inpcb is
destroyed in its owner netisr.
In netisrs, it is MP safe to find a udp inpcb from udp inpcb list,
then release the token and process the found udp inpcb.
In other threads, it is MP safe to find a udp inpcb from udp inpcb
list, then release the token and process the found udp inpcb in
non-blocking fashion.
See also the usage of inpcb marker.
o udp wildcard hash table:
On input path, udp wildcard hash table is searched in its owner
netisr. In order to ease implicit binding (bind during send),
connect after binding, and disconnect, udp inpcb are inserted
into and removed from other udp pcbinfos' wildcard hash table in
its owner netisr. Thus the udp wildcard hash table must be
protected w/ token.
At udp inpcb detach time, a message will go through all netisrs,
and this udp inpcb will be removed from the udp wildcard hash
table belonging to the current netisr. This makes sure that once
the current netisr runs the message handler, this udp inpcb will
not be used and be found in the current netisr. When the message
reaches the last netisr, this udp inpcb is redispatched to its
owner netisr to be destroyed.
In netisrs, it is MP safe to find a udp inpcb from udp wildcard
hash table, then release the token and process the found udp inpcb,
e.g. use udp inpcb found by in_pcblookuphash().
In other threads, it is MP safe to find a udp inpcb from udp
wildcard hash table, then release the token and process the found
udp inpcb in non-blocking fashion.
See also the usage of inpcb container marker.
o udp connect hash table:
It is lockless MP safe, and only accessed and modified in its owner
netisr.
- During inpcb iteration through inpcb list, use inpcb marker when
calling functions, which may block, e.g. in_pcbpurgeif0(), so the
inpcb iteration will not stop prematurely, if the inpcb being
processed is removed from the inpcb list.
- Use udp inpcb wildcard table and udp inpcb connect hash table to
dispatch input multicast and broadcast udp datagrams. Using udp inpcb
list could be time consume, since we need to check udp inpcb lists on
all cpus; and secondly, once udp inpcb has a local port, it will be in
either udp wildcard hash table or udp connect hash table.
Since the socket buffer operation on input path may block, inpcb
container marker is used when iterating inpcbs from udp inpcb wildcard
hash table. in_pcblookup_pkthash() is adjusted to skip inpcb
container marker.
- udp socket so_port is no longer fixed to netisr0 msgport
o Initial udp socket so_port is the current cpu's netisr msgport.
o Bound but unconnected udp socket so_port is selected according to
local port hash.
o Connected udp socket so_port is selected according to the udp hash,
i.e. laddr/faddr toeplitz hash (exception: multicast laddr or
multicast faddr, is hashed to netisr0).
o Multicast socket options are forced to be handled in netisr0, since
udp socket so_port may not be netisr0 msgport.
- In order to support asynchronized udp inpcb detach:
o EJUSTRETURN from pru_detach method now means protocol will call
sodiscard() and sofree() for soclose(). udp pru_detach method
returns EJUSTRETURN as of this commit.
o SS_ISCLOSING socket state is set before calling pru_detach method,
so protocol could avoid certain expensive, unnecessary or
disallowed operation in pru_disconnect or pru_detach method, e.g.
udp pru_disconnect method avoids putting udp inpcb back to udp
wildcard hash table, if SS_ISCLOSING is set.
MISC CHANGES:
- pcbinfo's cpu id must be set now; -1 is disallowed.
- udp pru_abort method should never be called; it panicks now.
- Restore traditional BSD behaviour, if unbound udp socket connect
fails: if local port of the udp socket has been selected, its inpcb
should be in wildcard hash table, i.e. the udp inpcb should be visible
on udp datagrams input path.
- Make sure multicast stuffs are adjusted only in netisr0 for inet6, if
one interface is about to be detached.
PERFORMANCE IMPROVEMENT:
For 'kq_connect_client -u' test, this commit gives 400% performance
improvement (31Kconns/s -> 160Kconns/s).
Summary of changes:
sys/kern/uipc_msg.c | 3 +-
sys/kern/uipc_socket.c | 39 ++-
sys/net/ipfw/ip_fw2.c | 8 +-
sys/net/netmsg.h | 1 +
sys/net/pf/pf.c | 2 +-
sys/netinet/in.c | 6 +-
sys/netinet/in_pcb.c | 410 +++++++++++++++++--------
sys/netinet/in_pcb.h | 44 ++-
sys/netinet/in_proto.c | 11 +-
sys/netinet/ip_demux.c | 26 +-
sys/netinet/ip_divert.c | 6 +-
sys/netinet/ip_output.c | 19 ++
sys/netinet/raw_ip.c | 6 +-
sys/netinet/tcp_subr.c | 16 +-
sys/netinet/udp_usrreq.c | 731 +++++++++++++++++++++++++++++---------------
sys/netinet/udp_var.h | 10 +-
sys/netinet6/in6_ifattach.c | 44 ++-
sys/netinet6/in6_pcb.c | 81 ++++-
sys/netinet6/in6_pcb.h | 4 +-
sys/netinet6/ipsec.c | 2 +-
sys/netinet6/raw_ip6.c | 2 +-
sys/netinet6/udp6_usrreq.c | 44 ++-
sys/sys/protosw.h | 4 +
sys/sys/socketops.h | 2 +-
sys/sys/socketvar.h | 2 +
25 files changed, 1029 insertions(+), 494 deletions(-)
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/be4519a228f0cdc3d23bcbc147abcf2e7d27f4f7
--
DragonFly BSD source repository
More information about the Commits
mailing list