git: kernel - Deal with VOP_NRENAME races

Matthew Dillon dillon at crater.dragonflybsd.org
Wed Aug 26 22:56:18 PDT 2020


commit ad1212685b9caac64c086a2363d15842dff21fd8
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date:   Wed Aug 26 22:41:05 2020 -0700

    kernel - Deal with VOP_NRENAME races
    
    * VOP_NRENAME() as implemented by the kernel can race any number of
      ways, including deadlocking, allowing duplicate entries, and panicing
      tmpfs.  It typically requires a heavy test load to replicate this but
      a dsynth build triggered the issue at least once.
    
      Other recently reported tmpfs issues with log file handling might also
      be effected.
    
    * A per-mount (semi-global) lock is now obtained whenever a directory
      is renamed.  This helps deal with numerous MP races that can cause
      lock order reversals.
    
      Loosely taken from netbsd and linux (mjg brought me up to speed on
      this).  Renaming directories is fraught with issues and this fix,
      while somewhat brutish, is fine.  Directories are very rarely renamed
      at a high rate.
    
    * kern_rename() now proactively locks all four elements of a rename
      operation (source_dir, source_file, dest_dir, dest_file) instead of
      only two.
    
    * The new locking function, cache_lock4_tondlocked(), takes no chances
      on lock order reversals and will use a (currently brute-force)
      non-blocking and lock cycling algorithm.  Probably needs some work.
    
    * Fix a bug in cache_nlookup() related to reusing DESTROYED entries
      in the hash table.  This algorithm tried to reuse the entries while
      maintaining shared locks, since only the entries need to be manipulate
      to reuse them.  However, this resulted in lookup races which could
      cause duplicate entries.  The duplicate entries then triggered
      assertions in TMPFS.
    
    * nlookup now tries a little harder and will retry if the parent of an
      element is flagged DESTROYED after its lock was released.  DESTROYED
      elements are not necessarily temporary events as an operation can wind
      up running in a deleted directory and must properly fail under those
      conditions.
    
    * Use krateprintf() to reduce debug output related to rename race
      reporting.
    
    * Revamp nfsrv_rename() as well (requires more testing).
    
    * Allow nfs_namei() to be called in a loop for retry purposes if
      desired.  It now detects that the nd structure is initialized
      from a prior run and won't try to re-parse the mbuf (needs testing).
    
    Reported-by: zrj, mjg

Summary of changes:
 sys/kern/vfs_cache.c    | 176 ++++++++++++++++++++++++++++++++++--------------
 sys/kern/vfs_mount.c    |   2 +
 sys/kern/vfs_nlookup.c  |  33 ++++++---
 sys/kern/vfs_syscalls.c |  93 ++++++++++++++-----------
 sys/sys/mount.h         |   1 +
 sys/sys/namecache.h     |   5 +-
 sys/vfs/nfs/nfs_serv.c  |  82 ++++++++++++++++------
 sys/vfs/nfs/nfs_subs.c  |  69 +++++++++++--------
 8 files changed, 312 insertions(+), 149 deletions(-)

http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/ad1212685b9caac64c086a2363d15842dff21fd8


-- 
DragonFly BSD source repository


More information about the Commits mailing list