cvs commit: src/sys/vm vnode_pager.h vnode_pager.c src/sys/kern vfs_bio.c src/sys/vfs/nfs nfs_bio.c nfs_vfsops.c nfs_vnops.c

Fri May 7 21:12:46 PDT 2004

dillon      2004/05/07 21:11:48 PDT

DragonFly src repository

  Modified files:
    sys/vm               vnode_pager.h vnode_pager.c 
    sys/kern             vfs_bio.c 
    sys/vfs/nfs          nfs_bio.c nfs_vfsops.c nfs_vnops.c 
  Log:
  Peter Edwards brought up an interesting NFS bug which we both originally
  thought would be a fairly straightforward bug fix.  But it turns out to
  require a nasty hack to fix.

  The issue is that near the file EOF NFS uses piecemeal writes and
  piecemeal buffer cache buffers.  The result is that manipulation through
  the buffer cache only sets some of the m->valid bits in the associated
  vm_page(s).  This case may also occur in the middle of a file if for
  example a file is piecemeal written and then ftruncated to be much
  larger (or lseek/write at a much higher seek position).

  The nfs_getpages() routine was assuming that if m->valid was non-0, the
  page is basically valid and no read rpc is required to fill it.

  The problem is that if you mmap() a piecemeal VM page and fault it in,
  m->valid is set to VM_PAGE_BITS_ALL (0xFF). Then, later, when NFS flushes
  the buffer cache, only some of the m->valid bits are clear (e.g. 0xFC).
  A later page fault will cause NFS to believe that the page is sufficiently
  valid and vm_fault will then zero-out the first X bytes of the page when,
  in fact, we really should have done an I/O to refill those X bytes.

  The fix in PR misc/64816 (FreeBSD) tried to solve this by checking to see
  if the m->valid bits were 'sufficiently valid' in the file EOF case but
  tesing with fsx resulted in several failure modes.  This doesn't work
  because (1) if you extend the file w/ ftruncate or lseek/write these
  partially valid pages can end up in the middle of the file rather then
  just at the end and (2) There may be a dirty buffer associated with these
  pages, meaning that the pages may contain dirty data, and we cannot safely
  overwrite the pages with a new read I/O.

  The solution in this patch is to deal with the screwy m->valid bit clearing
  but special-casing NFS and then having the BIO system clear ALL the m->valid
  bits instead of just some of them when NFS calls vinvalbuf().  That way
  m->valid will be set to 0 when the buffer is invalidated and the
  nfs_getpages() code can be left doing it's simple 'if any m->valid bits
  are set assume the whole page is valid' test.  In order for the BIO system
  to safely be able to do this (so as not to invalidate portions of a VM page
  associated with an adjacent buffer), the NFS io size has been further
  restricted to be an integral multiple of PAGE_SIZE.

  This is a terrible hack but there is no other way to fix the problem short
  of rewriting the entire buffer cache.  We will do that eventually, but not
  now.

  Reported-by: Peter Edwards <peter.edwards at xxxxxxxxxx>
  Referencing-PR: misc/64816 by Patrick Mackinlay <patrick at xxxxxxxxxxxxxxx>

  Revision  Changes    Path
  1.4       +4 -6      src/sys/vm/vnode_pager.h
  1.13      +12 -6     src/sys/vm/vnode_pager.c
  1.23      +33 -2     src/sys/kern/vfs_bio.c
  1.15      +33 -16    src/sys/vfs/nfs/nfs_bio.c
  1.17      +14 -0     src/sys/vfs/nfs/nfs_vfsops.c
  1.23      +5 -1      src/sys/vfs/nfs/nfs_vnops.c

http://www.dragonflybsd.org/cvsweb/src/sys/vm/vnode_pager.h.diff?r1=1.3&r2=1.4&f=h
http://www.dragonflybsd.org/cvsweb/src/sys/vm/vnode_pager.c.diff?r1=1.12&r2=1.13&f=h
http://www.dragonflybsd.org/cvsweb/src/sys/kern/vfs_bio.c.diff?r1=1.22&r2=1.23&f=h
http://www.dragonflybsd.org/cvsweb/src/sys/vfs/nfs/nfs_bio.c.diff?r1=1.14&r2=1.15&f=h
http://www.dragonflybsd.org/cvsweb/src/sys/vfs/nfs/nfs_vfsops.c.diff?r1=1.16&r2=1.17&f=h
http://www.dragonflybsd.org/cvsweb/src/sys/vfs/nfs/nfs_vnops.c.diff?r1=1.22&r2=1.23&f=h