You could do worse than Mach ports

Matthew Dillon dillon at apollo.backplane.com
Fri Jul 18 10:46:44 PDT 2003


:>  In the quota case,
:>you have to take a pretty heavy-weight view of what constitutes
:>memtadata, as well, since you have to deal with file extends
:>via write, which means you need to peek at the bytecount for
:>payload, and return value in case of success/failure.
:
:OK, either you or I are very confused here.
:
:Matt: could you comment on this?
:
:But these are in the message, not in the data blocks. You have
:access to the message... it was copied into the kernel with the
:Send()... you just need to take a protection barrier transition to
:access any data pointed to by the message.

    The issue comes down to how to deal with foreign address spaces.
    After all, if the address space is local one just passes a pointer.

    For system calls the foreign address space is the user process's
    address space.  User data pointers come in three forms:

    (1) They represent a file path
    (2) They represent a large block of data (e.g. the buffer in a read())
    (3) They represent a small block of data (e.g. gettimeofday()).
    (4) They represent the message itself

    In case 1 I will be rewriting the VFS cache to completely evaluate any
    user path and create the appropriate nodes in the VFS cache tree.  This
    will occur before entry into the VFS layer and, of course, any nodes
    that are not known in the cache will be given an 'UNKNOWN' designation
    and will have to be resolved by VFS (e.g. VFS_LOOKUP()).  VFS_LOOKUP()
    would no longer access userspace path elements directly.  A secondary
    advantage of integrating VFS_LOOKUP() with the VFS cache is that we
    would no longer have to leave directory vnodes locked during the
    traversal, the path would be locked through the VFS cache.

    In case 2 I intend to convert the user data pointer into an IOVEC array
    of VM Object's and offset ranges within those objects.  For example,
    (vnode->object, start_offset_of_read, extent_of_read).  The kernel
    would pass the IOVEC array around just like it passes UIO's around now
    (in fact, we are really talking about UIO's with their guts rearranged
    in this new scheme).  

    Any kernel entity that actually needs to read the data directly will
    have to map the object into its data space.  We might or might not 
    cache such mappings in the UIO just so in a multi-layered VFS system
    where multiple layers need a direct mapping, only one mapping is actually
    made. 

    In case 3 I intend for the syscall to copyin/copyout the data (just like
    it does now).

    In case 4 the kernel currently copies the syscall argments into kernel
    space already (the uap family of structures), and we would do the same
    with the message.  Just think of the message as being the syscall
    arguments themselves.  We need to do this anyway because the kernel
    version of the syscall message is going to be more involved then the
    user version.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>





More information about the Kernel mailing list