git: kernel - Deal with VOP_NRENAME races
Matthew Dillon
dillon at crater.dragonflybsd.org
Wed Aug 26 22:56:18 PDT 2020
commit ad1212685b9caac64c086a2363d15842dff21fd8
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date: Wed Aug 26 22:41:05 2020 -0700
kernel - Deal with VOP_NRENAME races
* VOP_NRENAME() as implemented by the kernel can race any number of
ways, including deadlocking, allowing duplicate entries, and panicing
tmpfs. It typically requires a heavy test load to replicate this but
a dsynth build triggered the issue at least once.
Other recently reported tmpfs issues with log file handling might also
be effected.
* A per-mount (semi-global) lock is now obtained whenever a directory
is renamed. This helps deal with numerous MP races that can cause
lock order reversals.
Loosely taken from netbsd and linux (mjg brought me up to speed on
this). Renaming directories is fraught with issues and this fix,
while somewhat brutish, is fine. Directories are very rarely renamed
at a high rate.
* kern_rename() now proactively locks all four elements of a rename
operation (source_dir, source_file, dest_dir, dest_file) instead of
only two.
* The new locking function, cache_lock4_tondlocked(), takes no chances
on lock order reversals and will use a (currently brute-force)
non-blocking and lock cycling algorithm. Probably needs some work.
* Fix a bug in cache_nlookup() related to reusing DESTROYED entries
in the hash table. This algorithm tried to reuse the entries while
maintaining shared locks, since only the entries need to be manipulate
to reuse them. However, this resulted in lookup races which could
cause duplicate entries. The duplicate entries then triggered
assertions in TMPFS.
* nlookup now tries a little harder and will retry if the parent of an
element is flagged DESTROYED after its lock was released. DESTROYED
elements are not necessarily temporary events as an operation can wind
up running in a deleted directory and must properly fail under those
conditions.
* Use krateprintf() to reduce debug output related to rename race
reporting.
* Revamp nfsrv_rename() as well (requires more testing).
* Allow nfs_namei() to be called in a loop for retry purposes if
desired. It now detects that the nd structure is initialized
from a prior run and won't try to re-parse the mbuf (needs testing).
Reported-by: zrj, mjg
Summary of changes:
sys/kern/vfs_cache.c | 176 ++++++++++++++++++++++++++++++++++--------------
sys/kern/vfs_mount.c | 2 +
sys/kern/vfs_nlookup.c | 33 ++++++---
sys/kern/vfs_syscalls.c | 93 ++++++++++++++-----------
sys/sys/mount.h | 1 +
sys/sys/namecache.h | 5 +-
sys/vfs/nfs/nfs_serv.c | 82 ++++++++++++++++------
sys/vfs/nfs/nfs_subs.c | 69 +++++++++++--------
8 files changed, 312 insertions(+), 149 deletions(-)
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/ad1212685b9caac64c086a2363d15842dff21fd8
--
DragonFly BSD source repository
More information about the Commits
mailing list