More syscall messaging commits, and some testing code as well.

Tue Aug 12 06:13:15 PDT 2003

Hello Matthew

     So we need a bounded concurrent queue to register the
message with.  There are concurrent queue designs which allow you to
push and pop without contention as long as you have at least one item
in the queue.  When the queue is at it's limit I expect the push() will
block for that call?  How should we handle the administration of
the bounded queue size, we could use a sysctl for a global limit.

Would we ever want to use a class based bounded queue and use the user pid
as the class?  We can then use 'ulimit' to set the bounded queue size on a
per process basis.  After we get global limits working, this would be a
good exercise to allow process based limits, we could use a sysctl to turn
on this feature.

Going down this road gives us a lot to play with, we could also implement
a priority based queue and implement static and/or dynamic priorities to
syscalls based what is going on in the kernel.

A lot to think about, I will take a look at your work.

Regards,

Eric Chet

On Mon, 11 Aug 2003, Matthew Dillon wrote:

>     I just committed another bunch of syscall messaging stuff, plus I
>     also committed some test code for it in /usr/src/test.  This is ad-hoc
>     test code and committers are welcome to throw in their own testing
>     code in that directory willy-nilly :-)
>
>     In this commit I have managed to asynchronize nanosleep(), but there
>     are still a bunch of issues that have to be worked through.  For
>     example, we need resource limits on the number of outstanding system
>     calls we allow to be in-progress and there needs to be a mechanism to
>     abort system calls which are in-progress when a program is killed.
>
>     Right now neither exists and ^Cing a test program at the wrong time will
>     very definitely crash the system, so asynch messaging syscalls are
>     currently restricted to root-use only.
>
>     The system call argument format can be observed in sys/sysproto.h.  The
>     basic structure is, for example:
>
> struct  read_args {
> #ifdef _KERNEL
>         union sysmsg sysmsg;
> #endif
>         union usrmsg usrmsg;
>         int     fd;     char fd_[PAD_(int)];
>         void *  buf;    char buf_[PAD_(void *)];
>         size_t  nbyte;  char nbyte_[PAD_(size_t)];
> };
>
>     As you can see it consists of three pieces:
>
>     (1) The kernel message representing the system call
>     (2) The original message copied from userspace
>     (3) The system call arguments copied from userspace
>
>     I am currently investigating how best to split up system call
>     operation.  A system call must do the following:
>
>     * extract any additional data from userland.  For example, nanosleep()
>       has to copyin() the timespec structure whos pointer is provided in
>       the call arguments.
>
>       At the moment the nanosleep() code extracts the timespec structure
>       from userland and stores it in 'sysmsg', so the execution phase
>       operates entirely on the contents sysmsg and not on usrmsg or the
>       arguments.
>
>     * execution phase, operating on data entirely within kernel space
>       (except for I/O calls of course).
>
>     * writeback phase.  e.g. nanosleep() may have to copyout() a timespec
>       structure back to userspace.
>
>     The question we face is whether it makes sense to separate the phases
>     in order to isolate the execution phase, which would allow system calls
>     to be made 'from' a kernel thread as a matter of course, rather then as
>     a special case.
>
>     --
>
>     Finally, note that the code is VERY messy.  There will be some severe
>     cleanups as time goes on.  I believe I have partioned the functionaly
>     such that we can really get some nice clean code out of the API.
>
> 					-Matt
> 					Matthew Dillon
> 					<dillon at xxxxxxxxxxxxx>
>

Regards,

Eric Chet