More syscall messaging commits, and some testing code as well.

Mon Aug 11 19:47:07 PDT 2003

    I just committed another bunch of syscall messaging stuff, plus I
    also committed some test code for it in /usr/src/test.  This is ad-hoc
    test code and committers are welcome to throw in their own testing 
    code in that directory willy-nilly :-)

    In this commit I have managed to asynchronize nanosleep(), but there
    are still a bunch of issues that have to be worked through.  For
    example, we need resource limits on the number of outstanding system
    calls we allow to be in-progress and there needs to be a mechanism to
    abort system calls which are in-progress when a program is killed.

    Right now neither exists and ^Cing a test program at the wrong time will
    very definitely crash the system, so asynch messaging syscalls are
    currently restricted to root-use only.

    The system call argument format can be observed in sys/sysproto.h.  The
    basic structure is, for example:

struct  read_args {
#ifdef _KERNEL
        union sysmsg sysmsg;
#endif
        union usrmsg usrmsg;
        int     fd;     char fd_[PAD_(int)];
        void *  buf;    char buf_[PAD_(void *)];
        size_t  nbyte;  char nbyte_[PAD_(size_t)];
};

    As you can see it consists of three pieces:

    (1) The kernel message representing the system call
    (2) The original message copied from userspace
    (3) The system call arguments copied from userspace

    I am currently investigating how best to split up system call
    operation.  A system call must do the following:

    * extract any additional data from userland.  For example, nanosleep()
      has to copyin() the timespec structure whos pointer is provided in
      the call arguments.

      At the moment the nanosleep() code extracts the timespec structure
      from userland and stores it in 'sysmsg', so the execution phase 
      operates entirely on the contents sysmsg and not on usrmsg or the
      arguments.

    * execution phase, operating on data entirely within kernel space
      (except for I/O calls of course).

    * writeback phase.  e.g. nanosleep() may have to copyout() a timespec
      structure back to userspace.

    The question we face is whether it makes sense to separate the phases
    in order to isolate the execution phase, which would allow system calls
    to be made 'from' a kernel thread as a matter of course, rather then as
    a special case.

    --

    Finally, note that the code is VERY messy.  There will be some severe
    cleanups as time goes on.  I believe I have partioned the functionaly
    such that we can really get some nice clean code out of the API.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>