From arcade at b1t.name Sat Jul 30 08:02:58 2016 From: arcade at b1t.name (Volodymyr Kostyrko) Date: Sat, 30 Jul 2016 18:02:58 +0300 Subject: Recent VM/IPI fixes probably triggered a number of HAMMER2 bugs. Message-ID: <579CC1A2.2070109@b1t.name> Hi all. I know that mostly nobody read this list but I don't want to dump alpha stuff testing results into public groups. On the other hand I believe that by testing HAMMER2 I help it making to the beta or even prod. First of all, recent IPI lock was easily triggered with HAMMER2 as it eats a lot of ram and stresses VM system substantially. I already posted core dumps showing that. After bugathon when all breakage is fixed HAMMER2 still appear to be slightly unstable, a lot more unstable that it was before. Before the bugsquashing I was able to use HAMMER2 partition as a root for a month and right now simple tests make system dump core. It looks to me like HAMMER2 only helps leftover bugs to manifest by stressing system to the limits. This is a result of formatting fs and copying some files to it: 0xffffffff8027cc0d in panic () (kgdb) #0 0xffffffff8027cc0d in panic () #1 0xffffffff802456ea in db_panic () #2 0xffffffff80245e9d in db_command_loop () #3 0xffffffff80248a82 in db_trap () #4 0xffffffff803e3948 in kdb_trap () #5 0xffffffff803ea87e in trap () #6 0xffffffff803d355f in calltrap () #7 0xffffffff803e3c50 in Debugger () #8 0xffffffe3496b2cf8 in ?? () #9 0xffffffff8027cc4f in panic () Backtrace stopped: frame did not save the PC chain 00000002eb47cc0a.01 key=00000000000111c4 meth=30 CHECK FAIL (flags=00144002, bref/data 321045cc1afb6358/bbab58d32ef3333b) panic: assertion "parent->error == 0" failed in hammer2_chain_scan at /usr/src/sys/vfs/hammer2/hammer2_chain.c:2575 cpuid = 3 Trace beginning at frame 0xffffffe3496b2c68 panic() at panic+0x25f 0xffffffff8027cc3a panic() at panic+0x25f 0xffffffff8027cc3a hammer2_chain_scan() at hammer2_chain_scan+0xf1 0xffffffff8172f751 hammer2_bulk_scan() at hammer2_bulk_scan+0x1d2 0xffffffff81737e37 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 Debugger("panic") CPU3 stopping CPUs: 0x00000007 stopped panic: from debugger cpuid = 3 boot() called on cpu#3 Uptime: 12m36s Warning: hardclock missed > 1 sec Physical memory: 15536 MB Looking closer to the issue I found that HAMMER2 stalls after some amount of data transferred. If I break the process and initiate a reboot the resulting core shows no signs of HAMMER2 problems: 0xffffffff8027ccb1 in panic () (kgdb) #0 0xffffffff8027ccb1 in panic () #1 0xffffffff8024577a in db_panic () #2 0xffffffff80245f2d in db_command_loop () #3 0xffffffff80248b12 in db_trap () #4 0xffffffff803e3bb8 in kdb_trap () #5 0xffffffff803e9ef6 in trap_fatal () #6 0xffffffff803ea11e in trap_pfault () #7 0xffffffff803ea8db in trap () #8 0xffffffff803d37cf in calltrap () #9 0xffffffff802faaa6 in vfs_vptofh () #10 0xffffffff8024d263 in elf64_puthdr () #11 0xffffffff8024ed4e in generic_elf_coredump () #12 0xffffffff8024ee46 in elf64_coredump () #13 0xffffffff8027d7ad in coredump () #14 0xffffffff8027fa55 in sigexit () #15 0xffffffff802802be in postsig () #16 0xffffffff803ea358 in userret () #17 0xffffffff803eb400 in syscall2 () #18 0xffffffff803d3a1b in Xfast_syscall () #19 0x000000000000002b in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) Fatal trap 12: page fault while in kernel mode cpuid = 2; lapic->id = 12000000 fault virtual address = 0x50 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff802faaa6 stack pointer = 0x10:0xffffffe33f3cf478 frame pointer = 0x10:0xffffffe33f3cf488 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 0, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 935 current thread = pri 6 kernel: type 12 trap, code=0 CPU2 stopping CPUs: 0x0000000b stopped panic: from debugger cpuid = 2 Fatal trap 3: breakpoint instruction fault while in kernel mode cpuid = 2; lapic->id = 12000000 instruction pointer = 0x8:0xffffffff803e3ec0 stack pointer = 0x10:0xffffffe33f3cf0d8 frame pointer = 0x10:0xffffffe33f3cf0d8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 935 current thread = pri 6 (CRIT) panic: from debugger cpuid = 2 boot() called on cpu#2 Uptime: 15m48s Warning: hardclock missed > 1 sec -- Sphinx of black quartz judge my vow. From arcade at b1t.name Sat Jul 30 08:02:58 2016 From: arcade at b1t.name (Volodymyr Kostyrko) Date: Sat, 30 Jul 2016 18:02:58 +0300 Subject: Recent VM/IPI fixes probably triggered a number of HAMMER2 bugs. Message-ID: <579CC1A2.2070109@b1t.name> Hi all. I know that mostly nobody read this list but I don't want to dump alpha stuff testing results into public groups. On the other hand I believe that by testing HAMMER2 I help it making to the beta or even prod. First of all, recent IPI lock was easily triggered with HAMMER2 as it eats a lot of ram and stresses VM system substantially. I already posted core dumps showing that. After bugathon when all breakage is fixed HAMMER2 still appear to be slightly unstable, a lot more unstable that it was before. Before the bugsquashing I was able to use HAMMER2 partition as a root for a month and right now simple tests make system dump core. It looks to me like HAMMER2 only helps leftover bugs to manifest by stressing system to the limits. This is a result of formatting fs and copying some files to it: 0xffffffff8027cc0d in panic () (kgdb) #0 0xffffffff8027cc0d in panic () #1 0xffffffff802456ea in db_panic () #2 0xffffffff80245e9d in db_command_loop () #3 0xffffffff80248a82 in db_trap () #4 0xffffffff803e3948 in kdb_trap () #5 0xffffffff803ea87e in trap () #6 0xffffffff803d355f in calltrap () #7 0xffffffff803e3c50 in Debugger () #8 0xffffffe3496b2cf8 in ?? () #9 0xffffffff8027cc4f in panic () Backtrace stopped: frame did not save the PC chain 00000002eb47cc0a.01 key=00000000000111c4 meth=30 CHECK FAIL (flags=00144002, bref/data 321045cc1afb6358/bbab58d32ef3333b) panic: assertion "parent->error == 0" failed in hammer2_chain_scan at /usr/src/sys/vfs/hammer2/hammer2_chain.c:2575 cpuid = 3 Trace beginning at frame 0xffffffe3496b2c68 panic() at panic+0x25f 0xffffffff8027cc3a panic() at panic+0x25f 0xffffffff8027cc3a hammer2_chain_scan() at hammer2_chain_scan+0xf1 0xffffffff8172f751 hammer2_bulk_scan() at hammer2_bulk_scan+0x1d2 0xffffffff81737e37 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 Debugger("panic") CPU3 stopping CPUs: 0x00000007 stopped panic: from debugger cpuid = 3 boot() called on cpu#3 Uptime: 12m36s Warning: hardclock missed > 1 sec Physical memory: 15536 MB Looking closer to the issue I found that HAMMER2 stalls after some amount of data transferred. If I break the process and initiate a reboot the resulting core shows no signs of HAMMER2 problems: 0xffffffff8027ccb1 in panic () (kgdb) #0 0xffffffff8027ccb1 in panic () #1 0xffffffff8024577a in db_panic () #2 0xffffffff80245f2d in db_command_loop () #3 0xffffffff80248b12 in db_trap () #4 0xffffffff803e3bb8 in kdb_trap () #5 0xffffffff803e9ef6 in trap_fatal () #6 0xffffffff803ea11e in trap_pfault () #7 0xffffffff803ea8db in trap () #8 0xffffffff803d37cf in calltrap () #9 0xffffffff802faaa6 in vfs_vptofh () #10 0xffffffff8024d263 in elf64_puthdr () #11 0xffffffff8024ed4e in generic_elf_coredump () #12 0xffffffff8024ee46 in elf64_coredump () #13 0xffffffff8027d7ad in coredump () #14 0xffffffff8027fa55 in sigexit () #15 0xffffffff802802be in postsig () #16 0xffffffff803ea358 in userret () #17 0xffffffff803eb400 in syscall2 () #18 0xffffffff803d3a1b in Xfast_syscall () #19 0x000000000000002b in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) Fatal trap 12: page fault while in kernel mode cpuid = 2; lapic->id = 12000000 fault virtual address = 0x50 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff802faaa6 stack pointer = 0x10:0xffffffe33f3cf478 frame pointer = 0x10:0xffffffe33f3cf488 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 0, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 935 current thread = pri 6 kernel: type 12 trap, code=0 CPU2 stopping CPUs: 0x0000000b stopped panic: from debugger cpuid = 2 Fatal trap 3: breakpoint instruction fault while in kernel mode cpuid = 2; lapic->id = 12000000 instruction pointer = 0x8:0xffffffff803e3ec0 stack pointer = 0x10:0xffffffe33f3cf0d8 frame pointer = 0x10:0xffffffe33f3cf0d8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 935 current thread = pri 6 (CRIT) panic: from debugger cpuid = 2 boot() called on cpu#2 Uptime: 15m48s Warning: hardclock missed > 1 sec -- Sphinx of black quartz judge my vow. From arcade at b1t.name Sat Jul 30 08:02:58 2016 From: arcade at b1t.name (Volodymyr Kostyrko) Date: Sat, 30 Jul 2016 18:02:58 +0300 Subject: Recent VM/IPI fixes probably triggered a number of HAMMER2 bugs. Message-ID: <579CC1A2.2070109@b1t.name> Hi all. I know that mostly nobody read this list but I don't want to dump alpha stuff testing results into public groups. On the other hand I believe that by testing HAMMER2 I help it making to the beta or even prod. First of all, recent IPI lock was easily triggered with HAMMER2 as it eats a lot of ram and stresses VM system substantially. I already posted core dumps showing that. After bugathon when all breakage is fixed HAMMER2 still appear to be slightly unstable, a lot more unstable that it was before. Before the bugsquashing I was able to use HAMMER2 partition as a root for a month and right now simple tests make system dump core. It looks to me like HAMMER2 only helps leftover bugs to manifest by stressing system to the limits. This is a result of formatting fs and copying some files to it: 0xffffffff8027cc0d in panic () (kgdb) #0 0xffffffff8027cc0d in panic () #1 0xffffffff802456ea in db_panic () #2 0xffffffff80245e9d in db_command_loop () #3 0xffffffff80248a82 in db_trap () #4 0xffffffff803e3948 in kdb_trap () #5 0xffffffff803ea87e in trap () #6 0xffffffff803d355f in calltrap () #7 0xffffffff803e3c50 in Debugger () #8 0xffffffe3496b2cf8 in ?? () #9 0xffffffff8027cc4f in panic () Backtrace stopped: frame did not save the PC chain 00000002eb47cc0a.01 key=00000000000111c4 meth=30 CHECK FAIL (flags=00144002, bref/data 321045cc1afb6358/bbab58d32ef3333b) panic: assertion "parent->error == 0" failed in hammer2_chain_scan at /usr/src/sys/vfs/hammer2/hammer2_chain.c:2575 cpuid = 3 Trace beginning at frame 0xffffffe3496b2c68 panic() at panic+0x25f 0xffffffff8027cc3a panic() at panic+0x25f 0xffffffff8027cc3a hammer2_chain_scan() at hammer2_chain_scan+0xf1 0xffffffff8172f751 hammer2_bulk_scan() at hammer2_bulk_scan+0x1d2 0xffffffff81737e37 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 Debugger("panic") CPU3 stopping CPUs: 0x00000007 stopped panic: from debugger cpuid = 3 boot() called on cpu#3 Uptime: 12m36s Warning: hardclock missed > 1 sec Physical memory: 15536 MB Looking closer to the issue I found that HAMMER2 stalls after some amount of data transferred. If I break the process and initiate a reboot the resulting core shows no signs of HAMMER2 problems: 0xffffffff8027ccb1 in panic () (kgdb) #0 0xffffffff8027ccb1 in panic () #1 0xffffffff8024577a in db_panic () #2 0xffffffff80245f2d in db_command_loop () #3 0xffffffff80248b12 in db_trap () #4 0xffffffff803e3bb8 in kdb_trap () #5 0xffffffff803e9ef6 in trap_fatal () #6 0xffffffff803ea11e in trap_pfault () #7 0xffffffff803ea8db in trap () #8 0xffffffff803d37cf in calltrap () #9 0xffffffff802faaa6 in vfs_vptofh () #10 0xffffffff8024d263 in elf64_puthdr () #11 0xffffffff8024ed4e in generic_elf_coredump () #12 0xffffffff8024ee46 in elf64_coredump () #13 0xffffffff8027d7ad in coredump () #14 0xffffffff8027fa55 in sigexit () #15 0xffffffff802802be in postsig () #16 0xffffffff803ea358 in userret () #17 0xffffffff803eb400 in syscall2 () #18 0xffffffff803d3a1b in Xfast_syscall () #19 0x000000000000002b in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) Fatal trap 12: page fault while in kernel mode cpuid = 2; lapic->id = 12000000 fault virtual address = 0x50 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff802faaa6 stack pointer = 0x10:0xffffffe33f3cf478 frame pointer = 0x10:0xffffffe33f3cf488 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 0, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 935 current thread = pri 6 kernel: type 12 trap, code=0 CPU2 stopping CPUs: 0x0000000b stopped panic: from debugger cpuid = 2 Fatal trap 3: breakpoint instruction fault while in kernel mode cpuid = 2; lapic->id = 12000000 instruction pointer = 0x8:0xffffffff803e3ec0 stack pointer = 0x10:0xffffffe33f3cf0d8 frame pointer = 0x10:0xffffffe33f3cf0d8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 935 current thread = pri 6 (CRIT) panic: from debugger cpuid = 2 boot() called on cpu#2 Uptime: 15m48s Warning: hardclock missed > 1 sec -- Sphinx of black quartz judge my vow. From arcade at b1t.name Sat Jul 30 08:02:58 2016 From: arcade at b1t.name (Volodymyr Kostyrko) Date: Sat, 30 Jul 2016 18:02:58 +0300 Subject: Recent VM/IPI fixes probably triggered a number of HAMMER2 bugs. Message-ID: <579CC1A2.2070109@b1t.name> Hi all. I know that mostly nobody read this list but I don't want to dump alpha stuff testing results into public groups. On the other hand I believe that by testing HAMMER2 I help it making to the beta or even prod. First of all, recent IPI lock was easily triggered with HAMMER2 as it eats a lot of ram and stresses VM system substantially. I already posted core dumps showing that. After bugathon when all breakage is fixed HAMMER2 still appear to be slightly unstable, a lot more unstable that it was before. Before the bugsquashing I was able to use HAMMER2 partition as a root for a month and right now simple tests make system dump core. It looks to me like HAMMER2 only helps leftover bugs to manifest by stressing system to the limits. This is a result of formatting fs and copying some files to it: 0xffffffff8027cc0d in panic () (kgdb) #0 0xffffffff8027cc0d in panic () #1 0xffffffff802456ea in db_panic () #2 0xffffffff80245e9d in db_command_loop () #3 0xffffffff80248a82 in db_trap () #4 0xffffffff803e3948 in kdb_trap () #5 0xffffffff803ea87e in trap () #6 0xffffffff803d355f in calltrap () #7 0xffffffff803e3c50 in Debugger () #8 0xffffffe3496b2cf8 in ?? () #9 0xffffffff8027cc4f in panic () Backtrace stopped: frame did not save the PC chain 00000002eb47cc0a.01 key=00000000000111c4 meth=30 CHECK FAIL (flags=00144002, bref/data 321045cc1afb6358/bbab58d32ef3333b) panic: assertion "parent->error == 0" failed in hammer2_chain_scan at /usr/src/sys/vfs/hammer2/hammer2_chain.c:2575 cpuid = 3 Trace beginning at frame 0xffffffe3496b2c68 panic() at panic+0x25f 0xffffffff8027cc3a panic() at panic+0x25f 0xffffffff8027cc3a hammer2_chain_scan() at hammer2_chain_scan+0xf1 0xffffffff8172f751 hammer2_bulk_scan() at hammer2_bulk_scan+0x1d2 0xffffffff81737e37 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 Debugger("panic") CPU3 stopping CPUs: 0x00000007 stopped panic: from debugger cpuid = 3 boot() called on cpu#3 Uptime: 12m36s Warning: hardclock missed > 1 sec Physical memory: 15536 MB Looking closer to the issue I found that HAMMER2 stalls after some amount of data transferred. If I break the process and initiate a reboot the resulting core shows no signs of HAMMER2 problems: 0xffffffff8027ccb1 in panic () (kgdb) #0 0xffffffff8027ccb1 in panic () #1 0xffffffff8024577a in db_panic () #2 0xffffffff80245f2d in db_command_loop () #3 0xffffffff80248b12 in db_trap () #4 0xffffffff803e3bb8 in kdb_trap () #5 0xffffffff803e9ef6 in trap_fatal () #6 0xffffffff803ea11e in trap_pfault () #7 0xffffffff803ea8db in trap () #8 0xffffffff803d37cf in calltrap () #9 0xffffffff802faaa6 in vfs_vptofh () #10 0xffffffff8024d263 in elf64_puthdr () #11 0xffffffff8024ed4e in generic_elf_coredump () #12 0xffffffff8024ee46 in elf64_coredump () #13 0xffffffff8027d7ad in coredump () #14 0xffffffff8027fa55 in sigexit () #15 0xffffffff802802be in postsig () #16 0xffffffff803ea358 in userret () #17 0xffffffff803eb400 in syscall2 () #18 0xffffffff803d3a1b in Xfast_syscall () #19 0x000000000000002b in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) Fatal trap 12: page fault while in kernel mode cpuid = 2; lapic->id = 12000000 fault virtual address = 0x50 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff802faaa6 stack pointer = 0x10:0xffffffe33f3cf478 frame pointer = 0x10:0xffffffe33f3cf488 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 0, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 935 current thread = pri 6 kernel: type 12 trap, code=0 CPU2 stopping CPUs: 0x0000000b stopped panic: from debugger cpuid = 2 Fatal trap 3: breakpoint instruction fault while in kernel mode cpuid = 2; lapic->id = 12000000 instruction pointer = 0x8:0xffffffff803e3ec0 stack pointer = 0x10:0xffffffe33f3cf0d8 frame pointer = 0x10:0xffffffe33f3cf0d8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 935 current thread = pri 6 (CRIT) panic: from debugger cpuid = 2 boot() called on cpu#2 Uptime: 15m48s Warning: hardclock missed > 1 sec -- Sphinx of black quartz judge my vow. From arcade at b1t.name Sat Jul 30 08:02:58 2016 From: arcade at b1t.name (Volodymyr Kostyrko) Date: Sat, 30 Jul 2016 18:02:58 +0300 Subject: Recent VM/IPI fixes probably triggered a number of HAMMER2 bugs. Message-ID: <579CC1A2.2070109@b1t.name> Hi all. I know that mostly nobody read this list but I don't want to dump alpha stuff testing results into public groups. On the other hand I believe that by testing HAMMER2 I help it making to the beta or even prod. First of all, recent IPI lock was easily triggered with HAMMER2 as it eats a lot of ram and stresses VM system substantially. I already posted core dumps showing that. After bugathon when all breakage is fixed HAMMER2 still appear to be slightly unstable, a lot more unstable that it was before. Before the bugsquashing I was able to use HAMMER2 partition as a root for a month and right now simple tests make system dump core. It looks to me like HAMMER2 only helps leftover bugs to manifest by stressing system to the limits. This is a result of formatting fs and copying some files to it: 0xffffffff8027cc0d in panic () (kgdb) #0 0xffffffff8027cc0d in panic () #1 0xffffffff802456ea in db_panic () #2 0xffffffff80245e9d in db_command_loop () #3 0xffffffff80248a82 in db_trap () #4 0xffffffff803e3948 in kdb_trap () #5 0xffffffff803ea87e in trap () #6 0xffffffff803d355f in calltrap () #7 0xffffffff803e3c50 in Debugger () #8 0xffffffe3496b2cf8 in ?? () #9 0xffffffff8027cc4f in panic () Backtrace stopped: frame did not save the PC chain 00000002eb47cc0a.01 key=00000000000111c4 meth=30 CHECK FAIL (flags=00144002, bref/data 321045cc1afb6358/bbab58d32ef3333b) panic: assertion "parent->error == 0" failed in hammer2_chain_scan at /usr/src/sys/vfs/hammer2/hammer2_chain.c:2575 cpuid = 3 Trace beginning at frame 0xffffffe3496b2c68 panic() at panic+0x25f 0xffffffff8027cc3a panic() at panic+0x25f 0xffffffff8027cc3a hammer2_chain_scan() at hammer2_chain_scan+0xf1 0xffffffff8172f751 hammer2_bulk_scan() at hammer2_bulk_scan+0x1d2 0xffffffff81737e37 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 Debugger("panic") CPU3 stopping CPUs: 0x00000007 stopped panic: from debugger cpuid = 3 boot() called on cpu#3 Uptime: 12m36s Warning: hardclock missed > 1 sec Physical memory: 15536 MB Looking closer to the issue I found that HAMMER2 stalls after some amount of data transferred. If I break the process and initiate a reboot the resulting core shows no signs of HAMMER2 problems: 0xffffffff8027ccb1 in panic () (kgdb) #0 0xffffffff8027ccb1 in panic () #1 0xffffffff8024577a in db_panic () #2 0xffffffff80245f2d in db_command_loop () #3 0xffffffff80248b12 in db_trap () #4 0xffffffff803e3bb8 in kdb_trap () #5 0xffffffff803e9ef6 in trap_fatal () #6 0xffffffff803ea11e in trap_pfault () #7 0xffffffff803ea8db in trap () #8 0xffffffff803d37cf in calltrap () #9 0xffffffff802faaa6 in vfs_vptofh () #10 0xffffffff8024d263 in elf64_puthdr () #11 0xffffffff8024ed4e in generic_elf_coredump () #12 0xffffffff8024ee46 in elf64_coredump () #13 0xffffffff8027d7ad in coredump () #14 0xffffffff8027fa55 in sigexit () #15 0xffffffff802802be in postsig () #16 0xffffffff803ea358 in userret () #17 0xffffffff803eb400 in syscall2 () #18 0xffffffff803d3a1b in Xfast_syscall () #19 0x000000000000002b in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) Fatal trap 12: page fault while in kernel mode cpuid = 2; lapic->id = 12000000 fault virtual address = 0x50 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff802faaa6 stack pointer = 0x10:0xffffffe33f3cf478 frame pointer = 0x10:0xffffffe33f3cf488 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 0, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 935 current thread = pri 6 kernel: type 12 trap, code=0 CPU2 stopping CPUs: 0x0000000b stopped panic: from debugger cpuid = 2 Fatal trap 3: breakpoint instruction fault while in kernel mode cpuid = 2; lapic->id = 12000000 instruction pointer = 0x8:0xffffffff803e3ec0 stack pointer = 0x10:0xffffffe33f3cf0d8 frame pointer = 0x10:0xffffffe33f3cf0d8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 935 current thread = pri 6 (CRIT) panic: from debugger cpuid = 2 boot() called on cpu#2 Uptime: 15m48s Warning: hardclock missed > 1 sec -- Sphinx of black quartz judge my vow. From arcade at b1t.name Sat Jul 30 08:02:58 2016 From: arcade at b1t.name (Volodymyr Kostyrko) Date: Sat, 30 Jul 2016 18:02:58 +0300 Subject: Recent VM/IPI fixes probably triggered a number of HAMMER2 bugs. Message-ID: <579CC1A2.2070109@b1t.name> Hi all. I know that mostly nobody read this list but I don't want to dump alpha stuff testing results into public groups. On the other hand I believe that by testing HAMMER2 I help it making to the beta or even prod. First of all, recent IPI lock was easily triggered with HAMMER2 as it eats a lot of ram and stresses VM system substantially. I already posted core dumps showing that. After bugathon when all breakage is fixed HAMMER2 still appear to be slightly unstable, a lot more unstable that it was before. Before the bugsquashing I was able to use HAMMER2 partition as a root for a month and right now simple tests make system dump core. It looks to me like HAMMER2 only helps leftover bugs to manifest by stressing system to the limits. This is a result of formatting fs and copying some files to it: 0xffffffff8027cc0d in panic () (kgdb) #0 0xffffffff8027cc0d in panic () #1 0xffffffff802456ea in db_panic () #2 0xffffffff80245e9d in db_command_loop () #3 0xffffffff80248a82 in db_trap () #4 0xffffffff803e3948 in kdb_trap () #5 0xffffffff803ea87e in trap () #6 0xffffffff803d355f in calltrap () #7 0xffffffff803e3c50 in Debugger () #8 0xffffffe3496b2cf8 in ?? () #9 0xffffffff8027cc4f in panic () Backtrace stopped: frame did not save the PC chain 00000002eb47cc0a.01 key=00000000000111c4 meth=30 CHECK FAIL (flags=00144002, bref/data 321045cc1afb6358/bbab58d32ef3333b) panic: assertion "parent->error == 0" failed in hammer2_chain_scan at /usr/src/sys/vfs/hammer2/hammer2_chain.c:2575 cpuid = 3 Trace beginning at frame 0xffffffe3496b2c68 panic() at panic+0x25f 0xffffffff8027cc3a panic() at panic+0x25f 0xffffffff8027cc3a hammer2_chain_scan() at hammer2_chain_scan+0xf1 0xffffffff8172f751 hammer2_bulk_scan() at hammer2_bulk_scan+0x1d2 0xffffffff81737e37 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 Debugger("panic") CPU3 stopping CPUs: 0x00000007 stopped panic: from debugger cpuid = 3 boot() called on cpu#3 Uptime: 12m36s Warning: hardclock missed > 1 sec Physical memory: 15536 MB Looking closer to the issue I found that HAMMER2 stalls after some amount of data transferred. If I break the process and initiate a reboot the resulting core shows no signs of HAMMER2 problems: 0xffffffff8027ccb1 in panic () (kgdb) #0 0xffffffff8027ccb1 in panic () #1 0xffffffff8024577a in db_panic () #2 0xffffffff80245f2d in db_command_loop () #3 0xffffffff80248b12 in db_trap () #4 0xffffffff803e3bb8 in kdb_trap () #5 0xffffffff803e9ef6 in trap_fatal () #6 0xffffffff803ea11e in trap_pfault () #7 0xffffffff803ea8db in trap () #8 0xffffffff803d37cf in calltrap () #9 0xffffffff802faaa6 in vfs_vptofh () #10 0xffffffff8024d263 in elf64_puthdr () #11 0xffffffff8024ed4e in generic_elf_coredump () #12 0xffffffff8024ee46 in elf64_coredump () #13 0xffffffff8027d7ad in coredump () #14 0xffffffff8027fa55 in sigexit () #15 0xffffffff802802be in postsig () #16 0xffffffff803ea358 in userret () #17 0xffffffff803eb400 in syscall2 () #18 0xffffffff803d3a1b in Xfast_syscall () #19 0x000000000000002b in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) Fatal trap 12: page fault while in kernel mode cpuid = 2; lapic->id = 12000000 fault virtual address = 0x50 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff802faaa6 stack pointer = 0x10:0xffffffe33f3cf478 frame pointer = 0x10:0xffffffe33f3cf488 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 0, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 935 current thread = pri 6 kernel: type 12 trap, code=0 CPU2 stopping CPUs: 0x0000000b stopped panic: from debugger cpuid = 2 Fatal trap 3: breakpoint instruction fault while in kernel mode cpuid = 2; lapic->id = 12000000 instruction pointer = 0x8:0xffffffff803e3ec0 stack pointer = 0x10:0xffffffe33f3cf0d8 frame pointer = 0x10:0xffffffe33f3cf0d8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 935 current thread = pri 6 (CRIT) panic: from debugger cpuid = 2 boot() called on cpu#2 Uptime: 15m48s Warning: hardclock missed > 1 sec -- Sphinx of black quartz judge my vow. From arcade at b1t.name Sat Jul 30 08:02:58 2016 From: arcade at b1t.name (Volodymyr Kostyrko) Date: Sat, 30 Jul 2016 18:02:58 +0300 Subject: Recent VM/IPI fixes probably triggered a number of HAMMER2 bugs. Message-ID: <579CC1A2.2070109@b1t.name> Hi all. I know that mostly nobody read this list but I don't want to dump alpha stuff testing results into public groups. On the other hand I believe that by testing HAMMER2 I help it making to the beta or even prod. First of all, recent IPI lock was easily triggered with HAMMER2 as it eats a lot of ram and stresses VM system substantially. I already posted core dumps showing that. After bugathon when all breakage is fixed HAMMER2 still appear to be slightly unstable, a lot more unstable that it was before. Before the bugsquashing I was able to use HAMMER2 partition as a root for a month and right now simple tests make system dump core. It looks to me like HAMMER2 only helps leftover bugs to manifest by stressing system to the limits. This is a result of formatting fs and copying some files to it: 0xffffffff8027cc0d in panic () (kgdb) #0 0xffffffff8027cc0d in panic () #1 0xffffffff802456ea in db_panic () #2 0xffffffff80245e9d in db_command_loop () #3 0xffffffff80248a82 in db_trap () #4 0xffffffff803e3948 in kdb_trap () #5 0xffffffff803ea87e in trap () #6 0xffffffff803d355f in calltrap () #7 0xffffffff803e3c50 in Debugger () #8 0xffffffe3496b2cf8 in ?? () #9 0xffffffff8027cc4f in panic () Backtrace stopped: frame did not save the PC chain 00000002eb47cc0a.01 key=00000000000111c4 meth=30 CHECK FAIL (flags=00144002, bref/data 321045cc1afb6358/bbab58d32ef3333b) panic: assertion "parent->error == 0" failed in hammer2_chain_scan at /usr/src/sys/vfs/hammer2/hammer2_chain.c:2575 cpuid = 3 Trace beginning at frame 0xffffffe3496b2c68 panic() at panic+0x25f 0xffffffff8027cc3a panic() at panic+0x25f 0xffffffff8027cc3a hammer2_chain_scan() at hammer2_chain_scan+0xf1 0xffffffff8172f751 hammer2_bulk_scan() at hammer2_bulk_scan+0x1d2 0xffffffff81737e37 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 hammer2_bulk_scan() at hammer2_bulk_scan+0x184 0xffffffff81737de9 Debugger("panic") CPU3 stopping CPUs: 0x00000007 stopped panic: from debugger cpuid = 3 boot() called on cpu#3 Uptime: 12m36s Warning: hardclock missed > 1 sec Physical memory: 15536 MB Looking closer to the issue I found that HAMMER2 stalls after some amount of data transferred. If I break the process and initiate a reboot the resulting core shows no signs of HAMMER2 problems: 0xffffffff8027ccb1 in panic () (kgdb) #0 0xffffffff8027ccb1 in panic () #1 0xffffffff8024577a in db_panic () #2 0xffffffff80245f2d in db_command_loop () #3 0xffffffff80248b12 in db_trap () #4 0xffffffff803e3bb8 in kdb_trap () #5 0xffffffff803e9ef6 in trap_fatal () #6 0xffffffff803ea11e in trap_pfault () #7 0xffffffff803ea8db in trap () #8 0xffffffff803d37cf in calltrap () #9 0xffffffff802faaa6 in vfs_vptofh () #10 0xffffffff8024d263 in elf64_puthdr () #11 0xffffffff8024ed4e in generic_elf_coredump () #12 0xffffffff8024ee46 in elf64_coredump () #13 0xffffffff8027d7ad in coredump () #14 0xffffffff8027fa55 in sigexit () #15 0xffffffff802802be in postsig () #16 0xffffffff803ea358 in userret () #17 0xffffffff803eb400 in syscall2 () #18 0xffffffff803d3a1b in Xfast_syscall () #19 0x000000000000002b in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) Fatal trap 12: page fault while in kernel mode cpuid = 2; lapic->id = 12000000 fault virtual address = 0x50 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff802faaa6 stack pointer = 0x10:0xffffffe33f3cf478 frame pointer = 0x10:0xffffffe33f3cf488 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 0, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 935 current thread = pri 6 kernel: type 12 trap, code=0 CPU2 stopping CPUs: 0x0000000b stopped panic: from debugger cpuid = 2 Fatal trap 3: breakpoint instruction fault while in kernel mode cpuid = 2; lapic->id = 12000000 instruction pointer = 0x8:0xffffffff803e3ec0 stack pointer = 0x10:0xffffffe33f3cf0d8 frame pointer = 0x10:0xffffffe33f3cf0d8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 935 current thread = pri 6 (CRIT) panic: from debugger cpuid = 2 boot() called on cpu#2 Uptime: 15m48s Warning: hardclock missed > 1 sec -- Sphinx of black quartz judge my vow.