Recent VM/IPI fixes probably triggered a number of HAMMER2 bugs.
Volodymyr Kostyrko
arcade at b1t.name
Sat Jul 30 08:02:58 PDT 2016
Hi all.
I know that mostly nobody read this list but I don't want to dump alpha
stuff testing results into public groups. On the other hand I believe
that by testing HAMMER2 I help it making to the beta or even prod.
First of all, recent IPI lock was easily triggered with HAMMER2 as it
eats a lot of ram and stresses VM system substantially. I already posted
core dumps showing that.
After bugathon when all breakage is fixed HAMMER2 still appear to be
slightly unstable, a lot more unstable that it was before. Before the
bugsquashing I was able to use HAMMER2 partition as a root for a month
and right now simple tests make system dump core. It looks to me like
HAMMER2 only helps leftover bugs to manifest by stressing system to the
limits.
This is a result of formatting fs and copying some files to it:
0xffffffff8027cc0d in panic ()
(kgdb) #0 0xffffffff8027cc0d in panic ()
#1 0xffffffff802456ea in db_panic ()
#2 0xffffffff80245e9d in db_command_loop ()
#3 0xffffffff80248a82 in db_trap ()
#4 0xffffffff803e3948 in kdb_trap ()
#5 0xffffffff803ea87e in trap ()
#6 0xffffffff803d355f in calltrap ()
#7 0xffffffff803e3c50 in Debugger ()
#8 0xffffffe3496b2cf8 in ?? ()
#9 0xffffffff8027cc4f in panic ()
Backtrace stopped: frame did not save the PC
chain 00000002eb47cc0a.01 key=00000000000111c4 meth=30 CHECK FAIL
(flags=00144002, bref/data 321045cc1afb6358/bbab58d32ef3333b)
panic: assertion "parent->error == 0" failed in hammer2_chain_scan at
/usr/src/sys/vfs/hammer2/hammer2_chain.c:2575
cpuid = 3
Trace beginning at frame 0xffffffe3496b2c68
panic() at panic+0x25f 0xffffffff8027cc3a
panic() at panic+0x25f 0xffffffff8027cc3a
hammer2_chain_scan() at hammer2_chain_scan+0xf1 0xffffffff8172f751
hammer2_bulk_scan() at hammer2_bulk_scan+0x1d2 0xffffffff81737e37
hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9
hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9
Debugger("panic")
CPU3 stopping CPUs: 0x00000007
stopped
panic: from debugger
cpuid = 3
boot() called on cpu#3
Uptime: 12m36s
Warning: hardclock missed > 1 sec
Physical memory: 15536 MB
Looking closer to the issue I found that HAMMER2 stalls after some
amount of data transferred. If I break the process and initiate a reboot
the resulting core shows no signs of HAMMER2 problems:
0xffffffff8027ccb1 in panic ()
(kgdb) #0 0xffffffff8027ccb1 in panic ()
#1 0xffffffff8024577a in db_panic ()
#2 0xffffffff80245f2d in db_command_loop ()
#3 0xffffffff80248b12 in db_trap ()
#4 0xffffffff803e3bb8 in kdb_trap ()
#5 0xffffffff803e9ef6 in trap_fatal ()
#6 0xffffffff803ea11e in trap_pfault ()
#7 0xffffffff803ea8db in trap ()
#8 0xffffffff803d37cf in calltrap ()
#9 0xffffffff802faaa6 in vfs_vptofh ()
#10 0xffffffff8024d263 in elf64_puthdr ()
#11 0xffffffff8024ed4e in generic_elf_coredump ()
#12 0xffffffff8024ee46 in elf64_coredump ()
#13 0xffffffff8027d7ad in coredump ()
#14 0xffffffff8027fa55 in sigexit ()
#15 0xffffffff802802be in postsig ()
#16 0xffffffff803ea358 in userret ()
#17 0xffffffff803eb400 in syscall2 ()
#18 0xffffffff803d3a1b in Xfast_syscall ()
#19 0x000000000000002b in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Fatal trap 12: page fault while in kernel mode
cpuid = 2; lapic->id = 12000000
fault virtual address = 0x50
fault code = supervisor read data, page not present
instruction pointer = 0x8:0xffffffff802faaa6
stack pointer = 0x10:0xffffffe33f3cf478
frame pointer = 0x10:0xffffffe33f3cf488
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 0, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 935
current thread = pri 6
kernel: type 12 trap, code=0
CPU2 stopping CPUs: 0x0000000b
stopped
panic: from debugger
cpuid = 2
Fatal trap 3: breakpoint instruction fault while in kernel mode
cpuid = 2; lapic->id = 12000000
instruction pointer = 0x8:0xffffffff803e3ec0
stack pointer = 0x10:0xffffffe33f3cf0d8
frame pointer = 0x10:0xffffffe33f3cf0d8
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, IOPL = 0
current process = 935
current thread = pri 6 (CRIT)
panic: from debugger
cpuid = 2
boot() called on cpu#2
Uptime: 15m48s
Warning: hardclock missed > 1 sec
--
Sphinx of black quartz judge my vow.
More information about the Hammer
mailing list