[netmp] socket accesses

Matthew Dillon dillon at apollo.backplane.com
Fri Aug 8 19:19:58 PDT 2008

    Generally speaking what I am recommending is that races be allowed
    insofar as they don't interfere with operation, which will be in
    most cases.  The cases where it would interfere are primarily going
    to be related to userland blocking conditions (e.g. read() on socket
    blocks waiting for incoming data/disconnect/shutdown, or write() on
    socket blocks waiting for buffer space to become available).

    These blocking conditions can be resolved by sending a synchronous
    message to the protocol stack, just like we do now for connect() and
    numerous other operations.  The protocol stack would then be able
    to resolve any races that occured, potentially returning the message
    (unblocking the connection) immediately in the rare case of a race.


    Ok, here are my suggestions:


	* These should only be set or cleared by the protocol stack,
	  except for a degenerate case in unp_connect2() (for socketpair()
	  I think).

	* While it is true that once one is set, then at least one or the
	  other will be found to be set under normal operation.  But
	  remote disconnection will still race, and that's an issue.

	* I believe that all such races can be resolved in the ssb_wait()
	  code.  Instead of setting a flag and tsleep()ing this code could
	  message the protocol stack (the same way we do for e.g. connect()).

	  The protocol stack would then resolve the races, potentially
	  returning the message immediately if .e.g. both bits were found
	  to be clear.


	* Set by both sides.  Should be ok as long as any races are resolved.
	  In the case of userland, it could set CANTRCVMORE and send an
	  async message to the protocol stack.

	  In the case of the protocol stack, it could set CANTRCVMORE and
	  then drain any wait-style messages (such as sent by soreceive()
	  when it blocks via ssb_wait(), using the new idea for ssb_wait()).

:- it is set on error by socreate(), but at this point the socket is not
:  reachable yet
:So, unless I'm missing something, SS_NOFDREF is only modified before the socket
:becomes relevant or after the socket is no longer reachable.

    Not sure about this one.

:Moving to so_incomp:
:it's modified by sofree() (only called by protocol thread), sonewconn (same) and
:soisconnected(). The latter is called from process context (fifo_open,
:portal_connect at least, maybe sctp too) and from the bluetooth netisr.
:->so_comp is modified by soclose (called from process context), soisconnected,
:sonewconn (see above), soaccept_predicate, sctp_get_peeloff (of course!),
:Assuming we can move all list modification in protocol thread context, the
:list traversal would still be tricky. Maybe a spinlock protecting the lists and
:queue lengths is in order for now? The same lock could be reused for protecting
:SS_INCOMP, SS_COMP. In the future we might try something more clever if this
:becomes a performance issue. Opinions?
:The other so_* fields seem to be easier; I'll send a separate mail for those.

    I think so_incomp is just used on listening sockets, right?  Protecting
    it with a spinlock is probably just fine.

    The so_state flags themselves would have to be set and cleared with
    atomic ops.  Any situation which requires notifying the protocol stack,
    such as when userland sets CANTRCVMORE, can be done with a 'checkstate'
    style message to the protocol stack.  The message would not contain the
    flag, userland would simply set the flag and then tell the protocol
    stack to check its flags state and do whatever needs to be done.

    How about read()/write()/send()/recv()/etc?  That is, so_rcvbuf and
    so_sndbuf operation?  I'm thinking the mbuf chains are going to have
    to be protected with a lock of some sort.  It seems like it should work,
    except when an mbuf must be split up... I think even that could be made
    to work if the new mbuf is pre-allocated/pre-cached so the split can
    occur inside the spinlock protection.

					Matthew Dillon 
					<dillon at backplane.com>

More information about the Kernel mailing list