cvs commit: src/sys/vfs/nfs nfs_serv.c

Fri Jul 16 10:38:02 PDT 2004

:-On [20040716 18:42], Matthew Dillon (dillon at xxxxxxxxxxxxxxxxxxxx) wrote:
:>    Someone actually wrote a paper based on the server-side heuristic I
:>    wrote in 1999?
:
:I emailed you about that site/paper back in February of this year, see
:20040215183341.GM47127 at xxxxxxxxxxxxxxxxxxxxxxx
:
:-- 
:Jeroen Ruigrok van der Werven <asmodai(at)wxs.nl> / asmodai / kita no mono

    I'm sure its somewhere in my pile :-)

    Its an interesting paper, despite all the mistakes.  I did like the work
    they did to bring nfsheur (which I wrote in 99) up-to-date.

    NFS servers are finicky beasts.  There is a three-way tradeoff that
    makes characterizing algorithms almost impossible.  For an NFS
    server performance is a combination of (1) reducing physical disk seeking,
    (2) efficient use of the disk cache, and (3) cpu overhead.   

    So, for example, one can reduce physical disk seeking by increasing the
    read-ahead or doing unconditional read-ahead, but this presumes an 
    infinitely-sized disk cache.  Any benchmark with fairly light cache
    characteristics will see an improvement but that doesn't mean that
    doing unconditional read-ahead is a good idea.  Likewise, the efficiency
    of the disk cache (that is, the ability of the disk cache to satisfy
    a request without having to go to the disk) has a huge impact on
    server performance.  An inefficient disk cache will destroy server
    performance in a heavily loaded environment.  Cpu overhead is less of
    an issue on modern systems but can still have a considerable impact on
    performance.

    Using tagging with SCSI disks is a function of the disk manufacturer.
    Seagate traditionally has had the best tagging firmware while most other
    vendors have (traditionally) had crap firmware.   But tagging itself is
    still at the mercy of disk seeks and on-disk cache algorithms.  In
    particular, on-disk cache algorithms can interfere/be-redundant against
    system caches and this can result in lower performance... but whos fault
    is that?  The disk caches algorithms or the kernel caches algorithms?
    There is no definitive answer.

    Memory copy overhead has dropped significantly in the last few years
    relative to I/O bandwidth.  A modern cpu, e.g. like an AMD64 or a P4,
    is capable of 3+ GBytes/sec worth of uncached copying bandwidth and
    this generally blows away the measily 60MB/s that a modern disk 
    can do.  So data copying alone is not an issue any more, though its
    presence within the algorithm can still cause other issues to occur
    that might cause people to believe that the data copying is at fault.

    On the otherhand, it is quite clear to me that the block sizes we 
    traditionally use for I/O are far too small.  Our FS code believes
    that an 8K request is reasonable and a 64K request is 'clustering'
    when, in fact, the actual truth of the matter is that a 32K request
    is reasonable and a 256K request is 'clustering'.  This basic problem
    in our core filesystem code is responsible for a lot of the benchmark
    confusion that occurs when people try to test things above the 
    filesystem.  

    This is doubly true due to the data layout methodology used by most
    modern hard disks... most hard disks lay the sectors out on their
    tracks BACKWARDS rather then forwards.  What this means is that the
    disk's firmware cache will start caching data from the track the moment
    the head settles on it and will continue caching at least until it hits
    the sector that was requested... at the point it hits the sector that
    was requested it will *ALREADY* have 'future' data in its cached (due
    to the backwards layout).  This means that modern disks almost NEVER
    actually spend extra time waiting for read-ahead-requested data before
    issuing the next seek.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>