You could do worse than Mach ports

Thu Jul 17 14:08:47 PDT 2003

:I really disagree with this (big surprise 8-)).
:
:The problem with this model is that you can only represent a small
:subset of the filesystem types, and almost none of the interesting
:ones.
:
:For anything that does anything interesting, you are going to have
:to map file data as well as metadata pages into the address space
:of whoever handles operations on the vp; specifically, here are the
:classes of manipulations, with a few examples of each:

    Yes, but there is a big difference in KVM needs when mapping meta-data
    verses mapping file data.  Regardless of how meta-data is managed, the
    critical path always has been and always will be regular file reads and
    writes, and directory lookups, and file read/write generally eats 
    directory lookup for lunch.  Remember, before I got vmiodir working
    the buffer cache dedicated very little space to caching directory meta
    data and the performance loss really wasn't all that terrible for most
    normal configurations.

    Even something like cryptfs can conceivably, if a crypto chip is 
    present, use DMA to directly access the data without it having to be
    mapped.  And if it isn't, so what?  You wind up mapping the data and
    take the performance hit.  Just because you might have to do it for
    some subsystems doesn't mean you should go and burden *ALL* the
    subsystems with that kind of overhead.  Also, data mapping and range
    locking are two entirely separate beasts.  You do not always need to
    range lock something you are mapping, and you do not always need to
    map something you are range-locking.  So integrating those functions into
    the core messaging system just adds more conditionals the critical path,
    making the core messaging system less efficient without any peformance
    improvements to show for it.

    It is important to keep interface APIs as simple as possible.  For example,
    the core messaging API does not require message sizes to be specified or
    memory objects to be registered.  That isn't its job.  If I want to 
    transition a core message across a protection layer I do it via the 
    port agent when the message is sent.  That way the *SAME* device driver
    could operate in kernelland or in userland and be optimal in both.  The
    kernelland version's port agent would not have to translate messages 
    across a protection boundary, the userland version's port agent would.
    In fact, if several such devices were all operating in the same userland
    process their port agent's would not have to do any translation within
    userland, either.

    But the device driver itself would work the same either way, and
    the best case winds up being the most optimal case.  That is what
    is important.  If I burden the messaging API with message sizes, 
    registered memory objects, data space conversions, data mappings, and
    all sorts of insundry other mach-like stuff I impose a severe burden
    on the 'best' case message passing situation which unnecessarily slows
    down the system and makes the messaging API far less useful because it
    would no longer be a light weight mechanism and thus no longer be
    suitable for light weight operations.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>