Recent VM/IPI fixes probably triggered a number of HAMMER2 bugs.

Volodymyr Kostyrko arcade at b1t.name
Sat Jul 30 08:02:58 PDT 2016


Hi all.

I know that mostly nobody read this list but I don't want to dump alpha 
stuff testing results into public groups. On the other hand I believe 
that by testing HAMMER2 I help it making to the beta or even prod.

First of all, recent IPI lock was easily triggered with HAMMER2 as it 
eats a lot of ram and stresses VM system substantially. I already posted 
core dumps showing that.

After bugathon when all breakage is fixed HAMMER2 still appear to be 
slightly unstable, a lot more unstable that it was before. Before the 
bugsquashing I was able to use HAMMER2 partition as a root for a month 
and right now simple tests make system dump core. It looks to me like 
HAMMER2 only helps leftover bugs to manifest by stressing system to the 
limits.

This is a result of formatting fs and copying some files to it:

0xffffffff8027cc0d in panic ()
(kgdb) #0  0xffffffff8027cc0d in panic ()
#1  0xffffffff802456ea in db_panic ()
#2  0xffffffff80245e9d in db_command_loop ()
#3  0xffffffff80248a82 in db_trap ()
#4  0xffffffff803e3948 in kdb_trap ()
#5  0xffffffff803ea87e in trap ()
#6  0xffffffff803d355f in calltrap ()
#7  0xffffffff803e3c50 in Debugger ()
#8  0xffffffe3496b2cf8 in ?? ()
#9  0xffffffff8027cc4f in panic ()
Backtrace stopped: frame did not save the PC

chain 00000002eb47cc0a.01 key=00000000000111c4 meth=30 CHECK FAIL 
(flags=00144002, bref/data 321045cc1afb6358/bbab58d32ef3333b)
panic: assertion "parent->error == 0" failed in hammer2_chain_scan at 
/usr/src/sys/vfs/hammer2/hammer2_chain.c:2575
cpuid = 3
Trace beginning at frame 0xffffffe3496b2c68
panic() at panic+0x25f 0xffffffff8027cc3a
panic() at panic+0x25f 0xffffffff8027cc3a
hammer2_chain_scan() at hammer2_chain_scan+0xf1 0xffffffff8172f751
hammer2_bulk_scan() at hammer2_bulk_scan+0x1d2 0xffffffff81737e37
hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9
hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9
Debugger("panic")

CPU3 stopping CPUs: 0x00000007
  stopped
panic: from debugger
cpuid = 3
boot() called on cpu#3
Uptime: 12m36s
Warning: hardclock missed > 1 sec
Physical memory: 15536 MB

Looking closer to the issue I found that HAMMER2 stalls after some 
amount of data transferred. If I break the process and initiate a reboot 
the resulting core shows no signs of HAMMER2 problems:

0xffffffff8027ccb1 in panic ()
(kgdb) #0  0xffffffff8027ccb1 in panic ()
#1  0xffffffff8024577a in db_panic ()
#2  0xffffffff80245f2d in db_command_loop ()
#3  0xffffffff80248b12 in db_trap ()
#4  0xffffffff803e3bb8 in kdb_trap ()
#5  0xffffffff803e9ef6 in trap_fatal ()
#6  0xffffffff803ea11e in trap_pfault ()
#7  0xffffffff803ea8db in trap ()
#8  0xffffffff803d37cf in calltrap ()
#9  0xffffffff802faaa6 in vfs_vptofh ()
#10 0xffffffff8024d263 in elf64_puthdr ()
#11 0xffffffff8024ed4e in generic_elf_coredump ()
#12 0xffffffff8024ee46 in elf64_coredump ()
#13 0xffffffff8027d7ad in coredump ()
#14 0xffffffff8027fa55 in sigexit ()
#15 0xffffffff802802be in postsig ()
#16 0xffffffff803ea358 in userret ()
#17 0xffffffff803eb400 in syscall2 ()
#18 0xffffffff803d3a1b in Xfast_syscall ()
#19 0x000000000000002b in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)


Fatal trap 12: page fault while in kernel mode
cpuid = 2; lapic->id = 12000000
fault virtual address   = 0x50
fault code              = supervisor read data, page not present
instruction pointer     = 0x8:0xffffffff802faaa6
stack pointer           = 0x10:0xffffffe33f3cf478
frame pointer           = 0x10:0xffffffe33f3cf488
code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 0, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 935
current thread          = pri 6
kernel: type 12 trap, code=0

CPU2 stopping CPUs: 0x0000000b
  stopped
panic: from debugger
cpuid = 2


Fatal trap 3: breakpoint instruction fault while in kernel mode
cpuid = 2; lapic->id = 12000000
instruction pointer     = 0x8:0xffffffff803e3ec0
stack pointer           = 0x10:0xffffffe33f3cf0d8
frame pointer           = 0x10:0xffffffe33f3cf0d8
code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, IOPL = 0
current process         = 935
current thread          = pri 6 (CRIT)
panic: from debugger
cpuid = 2
boot() called on cpu#2
Uptime: 15m48s
Warning: hardclock missed > 1 sec



-- 
Sphinx of black quartz judge my vow.



More information about the Hammer mailing list