Hammer errors.

Dan Cross crossd at gmail.com
Fri Jul 2 11:17:22 PDT 2021


On Thu, Jul 1, 2021 at 6:11 PM Dan Cross <crossd at gmail.com> wrote:

> On Thu, Jul 1, 2021 at 3:00 PM Matthew Dillon <dillon at backplane.com>
> wrote:
>
>> Upgrade to 6.0 for sure, as it fixes at least one bug in HAMMER2, to
>> eliminate that possibility.
>>
>
> Yes, both the previous installation and the one I put together yesterday
> were running 6.0. The previous install had been upgraded to 6.0 when 6.0
> was released (or within a couple of days).
>
>
>> RAM is a possibility, though unlikely.  If you are overclocking, turn off
>> the overclocking.  An ovewrclocked CPU can introduce corruption more easily
>> than overclocked ram can.
>>
>
> Not overclocking. RAM in this machine is non-ECC: I could imagine a bit
> error slipping into a checksum, though.
>
> And check the dmesg for any NVME related errors.
>>
>
> No NVMe related errors appeared in the dmesg. That a completely separate
> NVMe part would exhibit the same problem would tend to indicate a hardware
> error outside of the storage device itself (RAM, bad CPU, I suppose, or a
> signal integrity issue when seating the NVMe part) or a software bug. That
> the system had previously been running the same version of the software for
> over a month without issue, and a fresh install popped up the same errors
> on the same hardware (modulo the new storage device) would suggest some
> sort of hardware issue.
>
> I'm going to run a memtest and see if I can get an NVMe diagnostic to run
> somehow.
>

I just wanted to close the loop on this.

The problem was bad RAM in the machine. The memory test failed
spectacularly, and it would appear some data in RAM got corrupted on the
way to the original NVMe. Replacing the RAM and rebuilding the filesystem
seems to be ok so far.

I'm doing a burnin to see if the problems manifest themselves again, but I
kind of suspect things have settled down.

Matt, thanks for the `hammer2 -vv show ...` tip. It's detecting no errors
now.

        - Dan C.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dragonflybsd.org/pipermail/users/attachments/20210702/bfac757f/attachment.htm>


More information about the Users mailing list