syscall messaging interface API
David Leimbach
leimy2k at mac.com
Wed Jul 23 12:46:15 PDT 2003
Didn't the L4 folks find a way to make system calls on Pentiums without using
software interrupts? Isn't this like 10x faster?
I need to read stuff at the Pistachio site again but I think I am correct.
Any chance that could get integrated into DflyBSD? [sorry for shortenting :)]
If one could more fully separate the syscall APIs from the actual implementation couldn't
each low-level layer do more optimizations of the like?
Again... just shooting my naivete around.
Dave
On Wednesday, July 23, 2003, at 02:26PM, Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx> wrote:
> Here is my idea for the system messaging interface. I will use a
> new trap gate (0x81) to implement it, because it occurs to me that
> a message interface really ought to pass and return information in
> registers rather then on the stack (since the message itself is already
> in user memory we might as well just have to do the copyin() on the
> contents rather then on both the system messaging interface arguments
> and the contents of the message). And a new trap gate isolates us from
> the old syscall mechanism.
>
> int 0x81 to dispatch, arguments in eax, ecx, edx, return value in eax.
>
> error = sendsys(port, msg, msgsize)
>
> eax:error = int0x81(eax:port, ecx:msg, edx:msgsize)
>
> Send a syscall message to the kernel. The userland requests
> asynchronous or synchronous operation through the standard message
> flag MSGF_ASYNC. The userland specifies userland pointers to the
> userland version of the system port, the userland version of the
> message, and the size of the message.
>
> The kernel copyin()'s the message and acts on it, and either returns
> a synchronous or asynchronous error code as per our messaging
> API. Results (like the return value for read() or lseek()) will be
> stored in the message structure. Only error (errno) codes are
> returned in eax.
>
> The kernel will initially ignore the userland version of the system
> port but eventually we can use this to store interface versioning
> information (so we don't have to load it into the message every time).
>
> The kernel utilizes the reply port stored in the message to return the
> message to userland. The userland reply port may be NULL, in which
> case the kernel expects the userland to explicitly wait for the
> message to be returned or to poll for message completion passively,
> or the reply port may be non-NULL indicating that the kernel should
> return the message to the port.
>
> The reply port, if non-NULL, controls the action taken when a
> message is returned. The action can be:
>
> * Queue without notification
>
> * Queue and perform an upcall to the (port specified) function
>
> * Queue and perform an upcall managed by a critical section (the
> kernel would check to see if the user thread is in a critical
> section and if so would just flag it. The userland would later
> detect that flag and flush the kernel's message queue).
>
> * ... any other action that we can think of, e.g. things like queue
> with passive notification but revert to an upcall after a timeout
> if the userland doesn't call flushsys(). etc.
>
> error = waitsys(port, msg)
>
> eax:error = int0x81(eax:port, ecx:msg, edx:0)
>
> Ask the kernel to block until a message has been returned, or until
> a message is pending on the specified (userland) mesasge port, or
> both.
>
> error = flushsys()
>
> eax:error = int0x81(eax:NULL, ecx:NULL, edx:0)
>
> Ask the kernel to flush any pending messages that were held up due
> to userland being in a critical section. The kernell will have
> flagged this to the userland and the userland will then call
> flushsys() when it exits out of its last critical section.
>
>
> I believe that this gives us flexibility we need. I have also come up
> with a novel solution for signaling! The userland would queue
> 'signal' messages to the kernel. The kernel would then 'return' the
> appropriate signal message when the signal occurs. This gives userland
> complete control (via the reply port) on how to deal with signals.
>
> Signal messages would be like continuous I/O requests. The message would
> still be 'live' in the kernel even after it has 'returned' it to userland.
> The kernel would be free to return the message over and over again until
> the userland tells it to abort the signalling request.
>
> The userland would interlock with the kernel by virtual of a flag bit
> in the message or the reply port. This coupled with a userland version
> of the critical section would interlock the return-from-softint
> sequencing (i.e. so the kernel doesn't push an upcall on top of the same
> upcall that is in the middle of trying to return back).
>
> A similar form can be used for things like periodic timer requests...
> they can stay 'live' in the kernel and simply be returned over and over
> again to the userland.
>
> I know this sounds somewhat complex but it provides us with the greatest
> flexibility as well as an incremental development approach.. e.g. initially
> all system call messages are synchronous so we don't have to worry about
> reply ports. Then we implement passive reply ports. Then we implement
> software interrupts (upcalls), then we implement the more complex
> signalling semantics. All a very orderly and extremely powerful
> mechanism.
>
> -Matt
>
>
>
More information about the Kernel
mailing list