Errors during mirror-copy after upgrading to 5.6.1

tech_lists at mail.com tech_lists at mail.com
Sat Jun 29 05:57:41 PDT 2019


Greetings!

After upgrading from 5.4 to 5.6.1 I'm no more able to mirror two hammer PFSs to their respective slaves. Of course it could be just a coincidence, however problems started right after upgrading.

- In my PC I have an hard disk (spinning disk) with 8 master PFSs. Each of them is mirrored on a second disk of the same PC and on a third disk on another PC (I have two identical copies of the main disk).
- Both PCs a re DragonlfyBSD 5.6.1 and hammer is version 1.

- Two of this 8 PFSs can't be mirrored anymore, no matter if I use first backup disk or the second, this is the `hammer mirror-copy` invocation and output:

root# hammer mirror-copy /home/comm /mhome/pfs/comm
Prescan to break up bulk transfer
Prescan 2 chunks, total 4659 MBytes (4295704808, 590357072)
hammer: Mirror-read /home/comm failed: Numerical argument out of domain
hammer: Mirror-write /mhome/pfs/comm: Did not get termination sync record, or rec_size is wrong rt=-1

- And these are the errors in /var/log/messages

Jun 28 15:49:17 copernico kernel: hammer_btree_extract: CRC DATA @ a00003b000390000/16384 FAILED
Jun 28 15:49:17 copernico kernel: hammer_btree_extract: CRC DATA @ a00003b1be8c8000/16384 FAILED
Jun 28 15:49:17 copernico kernel: hammer_btree_extract: CRC DATA @ a00003abdd0dc000/16384 FAILED
Jun 28 15:49:17 copernico kernel: hammer_btree_extract: CRC DATA @ a00003acfd444000/16384 FAILED
Jun 28 15:49:17 copernico kernel: hammer_btree_extract: CRC DATA @ a00003b001320000/16384 FAILED
Jun 28 15:49:17 copernico kernel: hammer_btree_extract: CRC DATA @ a00003b012b90000/16384 FAILED
Jun 28 15:49:17 copernico kernel: hammer_btree_extract: CRC DATA @ a00003b0d78d4000/16384 FAILED
Jun 28 15:49:17 copernico kernel: hammer_btree_extract: CRC DATA @ a00003b1be1d0000/16384 FAILED
Jun 28 15:49:17 copernico kernel: hammer_btree_extract: CRC DATA @ a00003b1be8cc000/16384 FAILED

Some more infos:

- During cleanups (snapshots/prune/rebalance/reblock, I didn't try to force a recopy) there are no errors (neither on the command line, nor in /var/log/messages). This is true for all three disks.
- I evaluated the md5sum of every file in these problematic PFS and both mirrors, again without error (maybe a dumb test but I thought that forcing all data to be read could expose errors).
- So only hammer mirror-copy raise errors.
- After many retries (10-20) the mirror operation on one of the problematic PFSs succeeded on both mirrors. Content of the copy was identical (but I don't know if they are both identically corrupted!). After some successful mirror-copy invocations errors started again.


Can somebody give me some hint to proceed investigation?

I could try to copy data from these PFSs, destroy them and their slaves but I'm afraid to simply hide some disk problems. My main fear is to have some disk corruption going on but in this was the case I think Dragonfly would refuse to mount the disk.


Thank you!
Andrea



More information about the Users mailing list