syscall messaging interface API

Matthew Dillon dillon at
Wed Jul 23 12:27:12 PDT 2003

    Here is my idea for the system messaging interface.  I will use a
    new trap gate (0x81) to implement it, because it occurs to me that
    a message interface really ought to pass and return information in 
    registers rather then on the stack (since the message itself is already
    in user memory we might as well just have to do the copyin() on the
    contents rather then on both the system messaging interface arguments
    and the contents of the message).  And a new trap gate isolates us from
    the old syscall mechanism.

    int 0x81 to dispatch, arguments in eax, ecx, edx, return value in eax.

    error = sendsys(port, msg, msgsize)

	eax:error = int0x81(eax:port, ecx:msg, edx:msgsize)

	Send a syscall message to the kernel.  The userland requests 
	asynchronous or synchronous operation through the standard message
	flag MSGF_ASYNC.  The userland specifies userland pointers to the
	userland version of the system port, the userland version of the
	message, and the size of the message.

	The kernel copyin()'s the message and acts on it, and either returns
	a synchronous or asynchronous error code as per our messaging
	API.  Results (like the return value for read() or lseek()) will be
	stored in the message structure.  Only error (errno) codes are
	returned in eax.

	The kernel will initially ignore the userland version of the system
	port but eventually we can use this to store interface versioning
	information (so we don't have to load it into the message every time).

	The kernel utilizes the reply port stored in the message to return the
	message to userland.  The userland reply port may be NULL, in which
	case the kernel expects the userland to explicitly wait for the
	message to be returned or to poll for message completion passively,
	or the reply port may be non-NULL indicating that the kernel should
	return the message to the port.

	The reply port, if non-NULL, controls the action taken when a
	message is returned.  The action can be:

	* Queue without notification

	* Queue and perform an upcall to the (port specified) function

	* Queue and perform an upcall managed by a critical section (the
	  kernel would check to see if the user thread is in a critical
	  section and if so would just flag it.  The userland would later
	  detect that flag and flush the kernel's message queue).

	* ... any other action that we can think of, e.g. things like queue
	  with passive notification but revert to an upcall after a timeout
	  if the userland doesn't call flushsys().  etc.

    error = waitsys(port, msg)

	eax:error = int0x81(eax:port, ecx:msg, edx:0)

	Ask the kernel to block until a message has been returned, or until
	a message is pending on the specified (userland) mesasge port, or

    error = flushsys()

	eax:error = int0x81(eax:NULL, ecx:NULL, edx:0)

	Ask the kernel to flush any pending messages that were held up due
	to userland being in a critical section.  The kernell will have
	flagged this to the userland and the userland will then call 
	flushsys() when it exits out of its last critical section.

    I believe that this gives us flexibility we need.  I have also come up
    with a novel solution for signaling!  The userland would queue
    'signal' messages to the kernel.  The kernel would then 'return' the
    appropriate signal message when the signal occurs.  This gives userland
    complete control (via the reply port) on how to deal with signals.

    Signal messages would be like continuous I/O requests.  The message would
    still be 'live' in the kernel even after it has 'returned' it to userland.
    The kernel would be free to return the message over and over again until
    the userland tells it to abort the signalling request.  

    The userland would interlock with the kernel by virtual of a flag bit 
    in the message or the reply port.  This coupled with a userland version
    of the critical section would interlock the return-from-softint 
    sequencing (i.e. so the kernel doesn't push an upcall on top of the same
    upcall that is in the middle of trying to return back).

    A similar form can be used for things like periodic timer requests...
    they can stay 'live' in the kernel and simply be returned over and over
    again to the userland.

    I know this sounds somewhat complex but it provides us with the greatest
    flexibility as well as an incremental development approach.. e.g. initially
    all system call messages are synchronous so we don't have to worry about
    reply ports.  Then we implement passive reply ports.  Then we implement
    software interrupts (upcalls), then we implement the more complex 
    signalling semantics.   All a very orderly and extremely powerful


More information about the Kernel mailing list