[DragonFlyBSD - Bug #3408] (New) HAMMER2 syncer pipeline wedges under rapid newfs+cpdup loop

Wed May 6 06:57:21 PDT 2026

Issue #3408 has been reported by afranke.

----------------------------------------
Bug #3408: HAMMER2 syncer pipeline wedges under rapid newfs+cpdup loop
http://bugs.dragonflybsd.org/issues/3408

* Author: afranke
* Status: New
* Priority: Normal
* Target version: 6.6
* Start date: 2026-05-06
----------------------------------------
(authored together with Claude)

h2. Summary

A loop that rapidly @newfs_hammer2@'s a fresh PFS, mounts it,
cpdup's the running root into it, unmounts, and repeats — with no
idle time between iterations — wedges the HAMMER2 syncer pipeline
within ~3 iterations.

The wedge is *not a panic*. The system stays alive, sshd works,
unrelated processes are unaffected. cpdup is stuck in @flstik@
(generic kernel dirty-buffer wait at @sys/kern/vfs_bio.c:484@)
and HAMMER2 syncer threads are in @h2twait@ / @h2syndel at . Nothing
makes forward progress; recovery requires power-cycle.

h2. Reproducer

Tested on DragonFly master @ @4f37521524@ with the virtio-modern
PCI series applied. Call chain is HAMMER2/vfs-only; likely
reproducible on stock master.

<pre><code class="sh">
truncate -s 4G /var/scratch.img
vnconfig vn0 /var/scratch.img
while :; do
    umount /mnt/scratch 2>/dev/null
    sync
    newfs_hammer2 -L ROOT /dev/vn0
    mount -t hammer2 /dev/vn0 at ROOT /mnt/scratch
    cpdup -i0 -x / /mnt/scratch/
done
</code></pre>

Loop wedges in iteration 3 in our test (~2 minutes wall-clock,
~1.7 GB of cpdup'd content). Stress harness attached as
@hammer2-flush-wedge-stress.sh@ — same shape, with logging.

h2. State at wedge

<pre>
--- vmstat ---
  r   b w   fre   flt   re   pi   po   fr  da0  sg0   int   sys   ctx us sy id
  1   1 0 86.3M 0.98M 4.0G 1.8M    0 3.2G    0    0 30.4M 3.11M 21.8M  0  0 100

--- thread states (ps -axww -o pid,state,wchan) ---
B1 h2twait    syncer for the LIVE source FS (da0s1d)
B0 h2syndel   syncer11
B0 h2idle     ~32 worker threads in h2xop-ROOT.* and h2xop-LOCAL.*
D0 flstik     cpdup (kernel dirty-buffer wait, vfs_bio.c:484)
</pre>

100% CPU idle, zero disk IO at the captured instant. Cumulative
re/pi/po/fr counters (4 GB recycled, 3.2 GB freed since boot)
suggest heavy reclaim activity preceded the wedge.

@hammer2 dumpchain /mnt/scratch@ while wedged shows hundreds of
dirty chains accumulated, every one marked
@pflags 00006243 = HAMMER2_CHAIN_ALLOCATED | _MODIFIED | _INITIAL@
with @refs=0 at . None of them flushed. @dumpchain@ itself completes
(introspection is not deadlocked at the mutex level), so this is
a livelock / flush-starvation rather than a hard mutex deadlock.

h2. Probable shape of the cycle (speculative)

We did not instrument deeply enough to pin down the exact
cycle; what follows is inference from the captured states above.

cpdup reads from the source PFS (@/@) and writes to a fresh
destination PFS (@/dev/vn0 at ROOT@) — separate volumes. At wedge:

- The source PFS's syncer is in @h2twait@
  (@sys/vfs/hammer2/hammer2_admin.c:148@) — transaction wait,
  even though source is read-only by cpdup.
- A bucket syncer is in @h2syndel@ (@hammer2_vfsops.c:2755@),
  the "looping too hard, brief restart delay" path inside
  @hammer2_vfsops_sync at .
- All worker xops threads are idle (@h2idle@).
- cpdup is at the kernel-wide dirty-buffer wait (@flstik@).

The pattern reads as a cross-PFS dependency in the flush path:
the destination's flush can't tick because the source's syncer
is wedged in transaction wait, and the source's syncer can't
clear because of some dependency on the destination.

h2. Workaround

@sync; sleep 60; sync@ after @newfs_hammer2@ and before the
cpdup. Anecdotal across a handful of subsequent iterations: no
wedges with the settle period; ~3-iteration wedge without it.
Not a deterministic mitigation; suggests the race window is
"fresh-PFS state being heavily written before the syncer has
caught up to the prior PFS's teardown".

---Files--------------------------------
hammer2-flush-wedge-stress.sh (3.32 KB)

-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here: http://bugs.dragonflybsd.org/my/account