Random server crashes every few weeks (smp_invltlb: endless loop […] retrysmp_invltlb: ipi sent)

Stefan Unterweger 232.20711 at chiffre.aleturo.com
Fri May 27 00:38:48 PDT 2016

* Matthew Dillon on Thu, May 26, 2016 at 11:00:18AM -0700:
> It's really hard to say from something which is virtually hosted.  It kinda
> sounds like the virtual host isn't assigning enough of its own cpus to the
> virtual host.  The fact that DragonFly is complaining about smp_invltlb()
> implies that the host's virtualized cpu threads are not getting scheduled
> properly.
> One thing to note is that we do not do any instruction escapes to hint to
> virtual hosts when a cpu is in a tight loop waiting for synchronization.
> It would be nice if we had some support for that, it would probably make
> DFly play better on virtualized systems.

This is an interesting suggestion, which at least would explain at least
some of the cases where I’ve experienced the crashes (the daily HAMMER
cronjob, heavy paging under stress, I/O bursts and so on).

So in effect, could it be that the crashes are more likely as either my
own server comes under load or some -other- server who happens to run in
the same hypervisor?

Would this warrant opening a ticket with Profitbricks, or is it just as
likely that I’m wasting my time and will only get a response along the
lines of ‘Use Linux; Dragonfly BSD most certainly is not supported’?

> I suggest setting the number of cores to 1.  That will get rid of all SMP
> interplay and hopefully remove the issues the virtual host is choking on.

Interestingly enough, I have seen the opposite so far.  At first, I have
run the server on only one core, to save money and because it doesn’t
really yet need any more.  When on one core, it still freezes, along
approximately the same pattern, but I never got a trace there.

My guess then was that perhaps there would have been some odd race
condition between paging, HAMMER and dm_crypt—adding another core
temporarily seemed more stable and then regressed back to the mean.

I will try to set up another VM to see whether I can reliably reproduce
such a crash.  

Thanks for your answer,

PS: Just in case, as I’ve forgotten it previously: here’s the dmesg from
    the server in question.

| Copyright (c) 2003-2015 The DragonFly Project.
| Copyright (c) 1992-2003 The FreeBSD Project.
| Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
| 	The Regents of the University of California. All rights reserved.
| DragonFly v4.4.1-RELEASE #2: Sun Dec  6 19:10:59 EST 2015
|     root at www.shiningsilence.com:/usr/obj/home/justin/release/4_4/sys/X86_64_GENERIC
| TSC clock: 2600054420 Hz, i8254 clock: 1193169 Hz
| CPU: AMD Opteron 62xx class CPU (2600.11-MHz K8-class CPU)
|   Origin = "AuthenticAMD"  Id = 0x600f12  Stepping = 2
|   AMD Features=0x24500800<SYSCALL,NX,MMX+,Page1GB,LM>
|   AMD Features2=0x10be7<LAHF,CMP,SVM,ABM,SSE4A,MAS,Prefetch,OSVW,XOP,FMA4>
|   MONITOR/MWAIT Features=0x2<INTBRK>
| real memory  = 3219762176 (3070 MB)
| avail memory = 2990727168 (2852 MB)
| lapic: divisor index 0, frequency 500005713 Hz
| SMI Frequency (worst case): 28571 Hz (35 us)
| Initialize MI interrupts
| wdog: In-kernel automatic watchdog reset enabled
| kbd1 at kbdmux0
| md0: Preloaded image <initrd.img> 15728640 bytes at 0xffffffff82739ac0
| md1: Malloc disk
| ACPI: RSDP 0x00000000000FC980 000014 (v00 BOCHS )
| ACPI: RSDT 0x00000000BFFFBCA0 000040 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
| ACPI: FACP 0x00000000BFFFFF80 000074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
| ACPI: DSDT 0x00000000BFFFBCE0 00151D (v01 BXPC   BXDSDT   00000001 INTL 20100528)
| ACPI: FACS 0x00000000BFFFFF40 000040
| ACPI: APIC 0x00000000BFFFFC60 000270 (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
| ACPI: HPET 0x00000000BFFFFC20 000038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
| ACPI: SRAT 0x00000000BFFFF770 0004A8 (v01 BOCHS  BXPCSRAT 00000001 BXPC 00000001)
| ACPI: SSDT 0x00000000BFFFD8E0 001E8E (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
| ACPI: SSDT 0x00000000BFFFD870 00003D (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
| ACPI: SSDT 0x00000000BFFFD200 00066E (v01 BXPC   BXSSDTPC 00000001 INTL 20100528)
| cryptosoft0: <software crypto> on motherboard
| aesni0: <AES-CBC,AES-XTS> on motherboard
| padlock0: No ACE support.
| rdrand0: No RdRand support.
| acpi0: <BOCHS BXPCRSDT> on motherboard
| ACPI: 4 ACPI AML tables successfully acquired and loaded
| ACPI FADT: SCI testing interrupt mode ...
| ACPI FADT: SCI select level/low
| objcache_reclaimlist
| objcache_reclaimlist
| objcache_reclaimlist
| objcache_reclaimlist
| acpi0: Power Button (fixed)
| acpi_timer0 on acpi0
| acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
| acpi_hpet0: frequency 100000000
| pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
| pci0: <ACPI PCI bus> on pcib0
| pci_link4: Unable to route IRQs: AE_NOT_FOUND
| isab0: <PCI-ISA bridge> at device 1.0 on pci0
| isa0: <ISA bus> on isab0
| atapci0: <Intel PIIX3 WDMA2 controller> port 0xc120-0xc12f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 1.1 on pci0
| ata0: <ATA channel 0> on atapci0
| ata1: <ATA channel 1> on atapci0
| acd0: DVDROM <QEMU DVD-ROM/1.0> at ata1-master WDMA2
| uhci0: <Intel 82371SB (PIIX3) USB controller> port 0xc0c0-0xc0df irq 11 at device 1.2 on pci0
| usbus0: controller did not stop
| usbus0 on uhci0
| pci0: <bridge> (vendor 0x8086, dev 0x7113) at device 1.3 irq 9
| vgapci0: <VGA-compatible display> mem 0xfd000000-0xfdffffff at device 2.0 on pci0
| vgapci0: Boot video device
| virtio_pci0: <VirtIO PCI Balloon adapter> port 0xc0e0-0xc0ff irq 11 at device 3.0 on pci0
| virtio_pci1: <VirtIO PCI Block adapter> port 0xc000-0xc03f mem 0xfebf0000-0xfebf0fff irq 10 at device 5.0 on pci0
| vtblk0: <VirtIO Block Adapter> on virtio_pci1
| virtio_pci1: host features: 0x710006d4 <EventIdx,RingIndirect,NotifyOnEmpty,Topology,WriteCache,SCSICmds,BlockSize,DiskGeometry,MaxNumSegs>
| virtio_pci1: negotiated features: 0x254 <WriteCache,BlockSize,DiskGeometry,MaxNumSegs>
| virtio_pci2: <VirtIO PCI Block adapter> port 0xc040-0xc07f mem 0xfebf1000-0xfebf1fff irq 10 at device 6.0 on pci0
| vtblk1: <VirtIO Block Adapter> on virtio_pci2
| virtio_pci2: host features: 0x710006d4 <EventIdx,RingIndirect,NotifyOnEmpty,Topology,WriteCache,SCSICmds,BlockSize,DiskGeometry,MaxNumSegs>
| virtio_pci2: negotiated features: 0x254 <WriteCache,BlockSize,DiskGeometry,MaxNumSegs>
| virtio_pci3: <VirtIO PCI Network adapter> port 0xc100-0xc11f mem 0xfebf2000-0xfebf2fff irq 11 at device 7.0 on pci0
| vtnet0: <VirtIO Networking Adapter> on virtio_pci3
| virtio_pci3: host features: 0x711f8060 <EventIdx,RingIndirect,NotifyOnEmpty,RxModeExtra,VLanFilter,RxMode,ControlVq,Status,MrgRxBuf,TxAllGSO,MacAddress>
| virtio_pci3: negotiated features: 0x110f8020 <RingIndirect,NotifyOnEmpty,VLanFilter,RxMode,ControlVq,Status,MrgRxBuf,MacAddress>
| usbus0: 12Mbps Full Speed USB v1.0
| vtnet0: MAC address: 02:01:06:f6:1b:63
| add dynamic link state
| virtio_pci4: <VirtIO PCI Block adapter> port 0xc080-0xc0bf mem 0xfebf3000-0xfebf3fff irq 11 at device 8.0 on pci0
| vtblk2: <VirtIO Block Adapter> on virtio_pci4
| virtio_pci4: host features: 0x710006d4 <EventIdx,RingIndirect,NotifyOnEmpty,Topology,WriteCache,SCSICmds,BlockSize,DiskGeometry,MaxNumSegs>
| virtio_pci4: negotiated features: 0x254 <WriteCache,BlockSize,DiskGeometry,MaxNumSegs>
| atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
| atkbd0: <AT Keyboard> irq 1 on atkbdc0
| kbd0 at atkbd0
| ugen0.1: <Intel> at usbus0
| uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
| psm0: <PS/2 Mouse> irq 12 on atkbdc0
| psm0: model IntelliMouse Explorer, device ID 4
| cpu0: <ACPI CPU> on acpi0
| cpu_cst0: <ACPI CPU C-State> on cpu0
| cpu1: <ACPI CPU> on acpi0
| cpu_cst1: <ACPI CPU C-State> on cpu1
| ACPI: Enabled 16 GPEs in block 00 to 0F
| orm0: <ISA Option ROM> at iomem 0xe9800-0xeffff on isa0
| pmtimer0 on isa0
| vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
| sc0: <System console> at flags 0x100 on isa0
| sc0: VGA <16 virtual consoles, flags=0x300>
| sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
| sio0: type 16550A
| sio1: can't drain, serial port might not exist, disabling
| hpt27xx: no controller detected.
| CAM: Configuring 2 busses
| CAM: finished configuring all busses
| cd0 at ata1 bus 0 target 0 lun 0
| cd0: <QEMU QEMU DVD-ROM 1.0> Removable CD-ROM SCSI-0 device 
| cd0: 16.000MB/s transfers
| cd0: cd present [329728 x 2048 byte records]
| uhub0: 2 ports with 2 removable, self powered
| ugen0.2: <QEMU> at usbus0
| uhid0: <QEMU QEMU USB Tablet, class 0/0, rev 1.00/0.00, addr 2> on usbus0
| no B_DEVMAGIC (bootdev=0)
| Device Mapper version 4.16.0 loaded
| dm_target_zero: Successfully initialized
| dm_target_crypt: Successfully initialized
| dm_target_error: Successfully initialized
| Mounting root from ufs:md0s0
| DMA space used: 1236k, remaining available: 131072k
| Mounting devfs
| dm_target_crypt: Setting min/max mpipe buffers: 2/30
| dm_target_crypt: Setting min/max mpipe buffers: 2/30
| HAMMER(Rhaal) recovery check seqno=055a4f51
| HAMMER(Rhaal) recovery range 300000000cc2da60-300000000cc2da60
| HAMMER(Rhaal) recovery nexto 300000000cc2da60 endseqno=055a4f52
| HAMMER(Rhaal) mounted clean, no recovery needed
| chroot_kernel: set new rootnch/rootvnode to /new_root
| dm_target_crypt: Setting min/max mpipe buffers: 2/30
| dm_target_crypt: Setting min/max mpipe buffers: 2/30
| dm_target_crypt: Setting min/max mpipe buffers: 2/30
| HAMMER: read-only -> read-write
| HAMMER(Rhaal-Daten) recovery check seqno=352f5bc4
| HAMMER(Rhaal-Daten) recovery range 3000000000d5c108-3000000000d77bc8
| HAMMER(Rhaal-Daten) recovery nexto 3000000000d77bc8 endseqno=352f5cc1
| HAMMER(Rhaal-Daten) recovery undo  3000000000d5c108-3000000000d77bc8 (113344 bytes)(RW)
| HAMMER(Rhaal-Daten) Found REDO_SYNC 3000000000cb87a0
| HAMMER(Rhaal-Daten) recovery complete
| HAMMER(Rhaal-Daten) recovery redo  3000000000d5c108-3000000000d77bc8 (113344 bytes)(RW)
| HAMMER(Rhaal-Daten) Find extended redo  3000000000cb87a0, 670056 extbytes
| HAMMER(Rhaal-Daten) End redo recovery
| dm_target_crypt: Setting min/max mpipe buffers: 2/30
| swap low/high-water marks set to 83874/125811

More information about the Users mailing list