[netmp] socket accesses
Matthew Dillon
dillon at apollo.backplane.com
Fri Aug 8 19:19:58 PDT 2008
Generally speaking what I am recommending is that races be allowed
insofar as they don't interfere with operation, which will be in
most cases. The cases where it would interfere are primarily going
to be related to userland blocking conditions (e.g. read() on socket
blocks waiting for incoming data/disconnect/shutdown, or write() on
socket blocks waiting for buffer space to become available).
These blocking conditions can be resolved by sending a synchronous
message to the protocol stack, just like we do now for connect() and
numerous other operations. The protocol stack would then be able
to resolve any races that occured, potentially returning the message
(unblocking the connection) immediately in the rare case of a race.
---
Ok, here are my suggestions:
ISCONNECTED/ISCONNECTING:
* These should only be set or cleared by the protocol stack,
except for a degenerate case in unp_connect2() (for socketpair()
I think).
* While it is true that once one is set, then at least one or the
other will be found to be set under normal operation. But
remote disconnection will still race, and that's an issue.
* I believe that all such races can be resolved in the ssb_wait()
code. Instead of setting a flag and tsleep()ing this code could
message the protocol stack (the same way we do for e.g. connect()).
The protocol stack would then resolve the races, potentially
returning the message immediately if .e.g. both bits were found
to be clear.
CANTRCVMORE:
* Set by both sides. Should be ok as long as any races are resolved.
In the case of userland, it could set CANTRCVMORE and send an
async message to the protocol stack.
In the case of the protocol stack, it could set CANTRCVMORE and
then drain any wait-style messages (such as sent by soreceive()
when it blocks via ssb_wait(), using the new idea for ssb_wait()).
:WRT SS_NOFDREF:
:
:- it is set on error by socreate(), but at this point the socket is not
: reachable yet
:...
:So, unless I'm missing something, SS_NOFDREF is only modified before the socket
:becomes relevant or after the socket is no longer reachable.
Not sure about this one.
:Moving to so_incomp:
:
:it's modified by sofree() (only called by protocol thread), sonewconn (same) and
:soisconnected(). The latter is called from process context (fifo_open,
:portal_connect at least, maybe sctp too) and from the bluetooth netisr.
:
:->so_comp is modified by soclose (called from process context), soisconnected,
:sonewconn (see above), soaccept_predicate, sctp_get_peeloff (of course!),
:ng_ksocket_finish_accept().
:
:Assuming we can move all list modification in protocol thread context, the
:list traversal would still be tricky. Maybe a spinlock protecting the lists and
:queue lengths is in order for now? The same lock could be reused for protecting
:SS_INCOMP, SS_COMP. In the future we might try something more clever if this
:becomes a performance issue. Opinions?
:
:The other so_* fields seem to be easier; I'll send a separate mail for those.
:
:TIA,
:Aggelos
I think so_incomp is just used on listening sockets, right? Protecting
it with a spinlock is probably just fine.
The so_state flags themselves would have to be set and cleared with
atomic ops. Any situation which requires notifying the protocol stack,
such as when userland sets CANTRCVMORE, can be done with a 'checkstate'
style message to the protocol stack. The message would not contain the
flag, userland would simply set the flag and then tell the protocol
stack to check its flags state and do whatever needs to be done.
How about read()/write()/send()/recv()/etc? That is, so_rcvbuf and
so_sndbuf operation? I'm thinking the mbuf chains are going to have
to be protected with a lock of some sort. It seems like it should work,
except when an mbuf must be split up... I think even that could be made
to work if the new mbuf is pre-allocated/pre-cached so the split can
occur inside the spinlock protection.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Kernel
mailing list