Initial messaging / message-port infrastructure is in

Sat Jul 19 18:56:51 PDT 2003

    I've committed the initial messaging/message-port infrastructure.  Here
    are the cvsweb links:

http://www.dragonflybsd.org/cgi-bin/cvsweb.cgi/src/sys/kern/lwkt_msgport.c
http://www.dragonflybsd.org/cgi-bin/cvsweb.cgi/src/sys/sys/msgport.h
http://www.dragonflybsd.org/cgi-bin/cvsweb.cgi/src/sys/sys/thread.h

http://www.dragonflybsd.org/cgi-bin/cvsweb.cgi/src/sys/kern/lwkt_msgport.c?rev=HEAD&content-type=text/x-cvsweb-markup
http://www.dragonflybsd.org/cgi-bin/cvsweb.cgi/src/sys/sys/msgport.h?rev=HEAD&content-type=text/x-cvsweb-markup
http://www.dragonflybsd.org/cgi-bin/cvsweb.cgi/src/sys/sys/thread.h?rev=HEAD&content-type=text/x-cvsweb-markup

    None of it is tested yet since nothing uses it yet (though DEV is about to
    start using it in a degenerate form), but I think people will be 
    interested in seeing all of those comments I was making about messaging
    and IPIs realized by this commit.

    This also shows off one of the big advantages of asynchronous IPI 
    messaging.  The messaging and port functions do not need to obtain
    any mutexes (even if there were no MP lock) even when they wind up
    queueing or dequeueing something.  But that isn't the only reason...
    because the messaging operations are combined with scheduling ops and
    one IPI will cover both the queueing and the scheduling aspects of
    a message. 

    So right now we save at least 8 mutex equivalents with a single IPI
    message:

    Traditional mutex Design			LWKT/IPI design

    cpu1: get_queue_mtx				cpu1: send IPIQ to target cpu
    cpu1: queue_message
    cpu1: rel_queue_mtx
    cpu1: get_sched_mtx
    cpu1: schedule_target_thread
    cpu1: rel_sched_mtx
    cpu1: (IPI the target cpu anyway
	  in case it is idle??)

    cpu2: (receive wakeup IPI??)
    cpu2: get_sched_mtx				cpu2: receive IPIQ
    cpu2: locate_scheduled_thread		cpu2: queue message
    cpu2: rel_sched_mtx				cpu2: schedule target thread
    cpu2: get_queue_mtx				cpu2: dequeue message
    cpu2: dequeue_message			cpu2: execute message function
    cpu2: rel_queue_mtx
    cpu2: execute message function		

    Even if I were generous and ignored the fact that even in a mutexed 
    system cpu1 might want to wake up cpu2 with an IPI, the fact remains
    that in a mutexed system *8* mutex operations are executed before
    the message can be acted upon, and in the LWKT system *NO* mutex
    operations are executed before the message can be acted upon.

    The question then becomes: is the IPI latency (which costs neither cpu
    any actual cycles but can take a 'long' time (e.g. 1uS)) and the IPI 
    interrupt overhead (on cpu2, which does eat cycles on cpu2) worth the
    8 *CONTESTED* mutexes we just saved?  I would say:  probably.

    And when you add in the other efficiencies.. for example, the fact that
    lwkt_getport() and lwkt_waitport() (basically ALL message processing
    involving the thread owning the message port in question) requires only
    a critical section to manipulate.  No mutexes, no IPIs, no nothing.

    Now is it worth it?  I think so!

						-Matt