[DragonFlyBSD - Bug #3409] (In Progress) vtscsi: panic-dump deadlock on lockmgr SIM lock
dillon
bugtracker-admin at leaf.dragonflybsd.org
Wed May 6 22:24:00 PDT 2026
Issue #3409 has been updated by dillon.
Status changed from New to In Progress
This one is a bit of a more generalized problem that can happen with any dump device and not just vtscsi. I don't really like the AI generated patch because it makes quite a few assumptions about the cause of the dump checking the "panicstr" global. We might need a more generalized driver call that the kernel core dump code can call to try to make a device that may be holding locks usable.
I'm going to set the bug in-progress for now and not assign it.
----------------------------------------
Bug #3409: vtscsi: panic-dump deadlock on lockmgr SIM lock
http://bugs.dragonflybsd.org/issues/3409#change-14661
* Author: afranke
* Status: In Progress
* Priority: Normal
* Target version: 6.6
* Start date: 2026-05-06
----------------------------------------
(authored together with Claude)
h2. Summary
@vtscsi(4)@ registers @VTSCSI_MTX(sc)@ (a @lockmgr@ lock taken
with @LK_EXCLUSIVE@) as its CAM SIM lock. When the kernel
panics with @kern.dumpdev@ set on a virtio-scsi-rooted disk,
@dadump()@ calls @cam_periph_lock(periph)@ →
@lockmgr(sim->lock, LK_EXCLUSIVE)@, which blocks indefinitely
if any thread held the lock at the moment of panic — a near-
certainty for a busy disk. The dump deadlocks; in one
production observation the kernel sat wedged after the panic
banner until a hardware reset.
@vtblk(4)@ does not hit this: it uses @lwkt_serialize_t@, and
@vtblk_dump()@ explicitly bypasses serialization plus
quiesces the device via @vtblk_prepare_dump()@
(@virtio_blk.c:496-513, 961@).
h2. Code references (DragonFly master commit @4f37521524@)
* @sys/dev/virtual/virtio/scsi/virtio_scsivar.h:150-153@ —
@VTSCSI_LOCK@ / @VTSCSI_UNLOCK@ / @_OWNED@ / @_NOTOWNED@
are unconditional @lockmgr@ / @KKASSERT@ operations with
no panic guard.
* @sys/dev/virtual/virtio/scsi/virtio_scsi.c:608@ —
@cam_sim_alloc(... VTSCSI_MTX(sc) ...)@ registers the
blocking lock as the SIM lock.
* @sys/dev/virtual/virtio/scsi/virtio_scsi.c:754, 821@ —
@vtscsi_cam_action@ and @vtscsi_cam_poll@; both assert
@VTSCSI_LOCK_OWNED(sc)@, no @panicstr != NULL@ path.
* @sys/bus/cam/scsi/scsi_da.c:772@ — @dadump()@ calls
@cam_periph_lock(periph)@ (= @lockmgr(sim->lock,
LK_EXCLUSIVE)@) on the dump path.
h2. Fix
Attached patch
@0004-vtscsi-panic-aware-locking-and-dump-prepare.patch@:
* @VTSCSI_LOCK*@ macros become no-ops when @panicstr != NULL@,
so the dump path can drive vtscsi without blocking on a
stale-owned lock.
* @vtscsi_prepare_dump()@ (modeled on @vtblk_prepare_dump@)
runs once per dump (gated by @VTSCSI_FLAG_DUMPING@) and
reuses the existing
@vtscsi_stop@ / @vtscsi_drain_vqs@ / @vtscsi_reinit@ /
@vtscsi_disable_vqs_intr@ helpers to quiesce the device
for polled IO.
h2. Reproduction
Tested on a QEMU sandbox VM (q35, virtio-scsi-pci root, single
@scsi-hd@ disk):
<pre><code class="bash">
dumpon /dev/da0s1b
sysctl debug.debugger_on_panic=0
dd if=/dev/zero of=/tmp/big bs=1m count=2000 &
sysctl debug.panic=1
</code></pre>
*Without the patch* (n=1 production observation): kernel
prints panic banner, then sits in @lockmgr@ indefinitely. No
"Dump complete," no halt; recovered only by hardware reset.
*With the patch* (sandbox VM, log attached as @boot.log.v2@):
<pre>
Dumping 352 MB:virtio_pci0: host features: 0x79000006 ...
virtio_pci0: negotiated features: 0x10000002 ...
Aborting dump due to I/O error.
(da0:vtscsi0:0:0:0): WRITE(10). CDB: 2a 0 0 95 3 36 0 0 1 0
** DUMP FAILED (ERROR 5) **
</pre>
The dump-path deadlock unblocks: @dadump → cam_periph_lock
→ vtscsi_cam_action → vtscsi_prepare_dump@ runs to completion
and the kernel halts cleanly within seconds.
h2. Known follow-up (out of scope)
The dump itself still fails with EIO at the polled write —
the post- at virtio_reinit@ request_vq isn't being driven by the
polled-IO submission path. Separate report; without this
patch the dump path never reaches the point where the EIO
occurs.
h2. Workaround
Until fixed: leave @kern.dumpdev@ unset on a virtio-scsi-rooted
DragonFly system. A panic then takes the normal
panic→reboot path, instead of deadlocking in the dump path.
---Files--------------------------------
boot.log.v1 (44.2 KB)
boot.log.v2 (44.3 KB)
0004-vtscsi-panic-aware-locking-and-dump-prepare.patch (5.78 KB)
--
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here: http://bugs.dragonflybsd.org/my/account
More information about the Bugs
mailing list