syscall messaging interface API

David Leimbach leimy2k at
Wed Jul 23 12:46:15 PDT 2003

 Didn't the L4 folks find a way to make system calls on Pentiums without using
software interrupts?  Isn't this like 10x faster?

I need to read stuff at the Pistachio site again but I think I am correct.

Any chance that could get integrated into DflyBSD?  [sorry for shortenting :)]

If one could more fully separate the syscall APIs from the actual implementation couldn't
each low-level layer do more optimizations of the like?

Again... just shooting my naivete around.

On Wednesday, July 23, 2003, at 02:26PM, Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx> wrote:

>    Here is my idea for the system messaging interface.  I will use a
>    new trap gate (0x81) to implement it, because it occurs to me that
>    a message interface really ought to pass and return information in 
>    registers rather then on the stack (since the message itself is already
>    in user memory we might as well just have to do the copyin() on the
>    contents rather then on both the system messaging interface arguments
>    and the contents of the message).  And a new trap gate isolates us from
>    the old syscall mechanism.
>    int 0x81 to dispatch, arguments in eax, ecx, edx, return value in eax.
>    error = sendsys(port, msg, msgsize)
>	eax:error = int0x81(eax:port, ecx:msg, edx:msgsize)
>	Send a syscall message to the kernel.  The userland requests 
>	asynchronous or synchronous operation through the standard message
>	flag MSGF_ASYNC.  The userland specifies userland pointers to the
>	userland version of the system port, the userland version of the
>	message, and the size of the message.
>	The kernel copyin()'s the message and acts on it, and either returns
>	a synchronous or asynchronous error code as per our messaging
>	API.  Results (like the return value for read() or lseek()) will be
>	stored in the message structure.  Only error (errno) codes are
>	returned in eax.
>	The kernel will initially ignore the userland version of the system
>	port but eventually we can use this to store interface versioning
>	information (so we don't have to load it into the message every time).
>	The kernel utilizes the reply port stored in the message to return the
>	message to userland.  The userland reply port may be NULL, in which
>	case the kernel expects the userland to explicitly wait for the
>	message to be returned or to poll for message completion passively,
>	or the reply port may be non-NULL indicating that the kernel should
>	return the message to the port.
>	The reply port, if non-NULL, controls the action taken when a
>	message is returned.  The action can be:
>	* Queue without notification
>	* Queue and perform an upcall to the (port specified) function
>	* Queue and perform an upcall managed by a critical section (the
>	  kernel would check to see if the user thread is in a critical
>	  section and if so would just flag it.  The userland would later
>	  detect that flag and flush the kernel's message queue).
>	* ... any other action that we can think of, e.g. things like queue
>	  with passive notification but revert to an upcall after a timeout
>	  if the userland doesn't call flushsys().  etc.
>    error = waitsys(port, msg)
>	eax:error = int0x81(eax:port, ecx:msg, edx:0)
>	Ask the kernel to block until a message has been returned, or until
>	a message is pending on the specified (userland) mesasge port, or
>	both.
>    error = flushsys()
>	eax:error = int0x81(eax:NULL, ecx:NULL, edx:0)
>	Ask the kernel to flush any pending messages that were held up due
>	to userland being in a critical section.  The kernell will have
>	flagged this to the userland and the userland will then call 
>	flushsys() when it exits out of its last critical section.
>    I believe that this gives us flexibility we need.  I have also come up
>    with a novel solution for signaling!  The userland would queue
>    'signal' messages to the kernel.  The kernel would then 'return' the
>    appropriate signal message when the signal occurs.  This gives userland
>    complete control (via the reply port) on how to deal with signals.
>    Signal messages would be like continuous I/O requests.  The message would
>    still be 'live' in the kernel even after it has 'returned' it to userland.
>    The kernel would be free to return the message over and over again until
>    the userland tells it to abort the signalling request.  
>    The userland would interlock with the kernel by virtual of a flag bit 
>    in the message or the reply port.  This coupled with a userland version
>    of the critical section would interlock the return-from-softint 
>    sequencing (i.e. so the kernel doesn't push an upcall on top of the same
>    upcall that is in the middle of trying to return back).
>    A similar form can be used for things like periodic timer requests...
>    they can stay 'live' in the kernel and simply be returned over and over
>    again to the userland.
>    I know this sounds somewhat complex but it provides us with the greatest
>    flexibility as well as an incremental development approach.. e.g. initially
>    all system call messages are synchronous so we don't have to worry about
>    reply ports.  Then we implement passive reply ports.  Then we implement
>    software interrupts (upcalls), then we implement the more complex 
>    signalling semantics.   All a very orderly and extremely powerful
>    mechanism.
>						-Matt

More information about the Kernel mailing list