[DragonFlyBSD - Bug #2962] (Resolved) Hammer PFS Slave has broken symbolic link, recreating it doesn't work

bugtracker-admin at leaf.dragonflybsd.org bugtracker-admin at leaf.dragonflybsd.org
Sun Oct 30 21:32:39 PDT 2016


Issue #2962 has been updated by benjolitz.

Status changed from New to Resolved

After the IRC discussion, I noticed that the session for recreating the PFS did not complete as I expected. When it did, it had recreated the faulted PFS correctly.

Therefore I consider this issue closed given that:

1) A Faulted PFS may be deleted using pfs-destroy
2) It may be created using mirror-copy.

While that operation was pending, I made another backup of the master PFS to another drive entirely, also running HAMMER.

To my dismay, attempts to delete the backup directory (2.1TiB) caused a denial of service on the system. A hard reset only caused HAMMER to attempt to rollback the transaction log and caused yet another denial of service.

After giving it 6 hours to attempt to mount via `mount_hammer` (Control-T revealed it was in `nbufs`), I gave up, booted into single user mode, and disabled the master/slave drives for that HAMMER filesystem.

Any attempts to mount the DoS'ing HAMMER filesystem were ineffective due to it's insistence on reverting to a prior transaction using it's undo log.

I've since switched to the backup (slave) and made it the master and mounted it in the correct place.

The now-faulted HAMMER master (with 2.1TiB pending deletions of history) remains unmounted.

If there's any advice or interest in figuring out why a `sudo rm -rf /Archive1/backup` caused a denial of service, I'm happy to conduct any activities on it for one week from this update (ending on 11/5/2016).

After that, I will reformat the faulted master and set it up as a new mirrored-slave of the recently promoted slave-turned-master.

No data has been lost due to the master-slave mirroring of HAMMER, however this experience would have been catastrophic if I had conducted a very large deletion sweep on a HAMMER partition (no explicit PFS used to hold the errant data).


Lessons learned:

1. If you have a faulted PFS, destroy it and recreate it. Wait for any mirroring to be complete as the symlink to it will remain invalid until the mirroring is complete.

2. If you make a multi Terabyte copy of a HAMMER master to another HAMMER master with default snapshot/history configuration, do not attempt to delete it all at once. Suggest deleting in sweeps of a few gigabytes and increasing until system latency is noticeably implacted.

3. If you are storing vast quantities of data on HAMMER, ensure your snapshot/history configuration is sensible. Mine was using the defaults, which now seems questionable.

4. Mirror-mirror-mirror your data.

5. If you find yourself unable to boot due to HAMMER redo on a NON-ROOT HAMMER filesystem, use the boot menu to launch single user mode, mount /var, remount root as read-write and use `vi` or any other editor to comment out the guilty filesystem so you may get a working environment.

6. Do not take this as an indictment of HAMMER but a "stupid user" story wherein I provoked a pathological case while incorrectly assuming my earlier efforts to remirror the PFS were ineffective.


Ironically, the original offender (Archive2) and it's backup (Archive2Backup) were restored completely and have no issues, while my worst-case guess of backing up the master and deleting an now-no-longer-needed copy of the file hierarchy ended up causing an even bigger problem. 




----------------------------------------
Bug #2962: Hammer PFS Slave has broken symbolic link, recreating it doesn't work
http://bugs.dragonflybsd.org/issues/2962#change-13029

* Author: benjolitz
* Status: Resolved
* Priority: High
* Assignee: 
* Category: VFS subsystem
* Target version: 
----------------------------------------
I setup mirroring as described in this document - https://www.dragonflybsd.org/docs/how_to_implement_hammer_pseudo_file_system__40___pfs___41___slave_mirroring_from_pfs_master/

I've experienced several power failures and I've noticed that one of my mirrored backups no longer works for resolving it's symbolic link.

If I destroy the errored pfs (tv) and recreate it via hammer pfs-slave, the symbolic link still doesn't work.

Console output:

nyx# file /Archive2Backup/pfs/*
/Archive2Backup/pfs/movies: symbolic link to @@0x0000000108a74b20:00001
/Archive2Backup/pfs/tv:     broken symbolic link to @@0x0000000100058744:00002
nyx# hammer pfs-status /Archive2Backup/pfs/tv
/Archive2Backup/pfs/tv  PFS #2 {
    sync-beg-tid=0x0000000000000001
    sync-end-tid=0x0000000100058744
    shared-uuid=816cf516-5783-11e6-8627-d150991a2d92
    unique-uuid=67a8792c-9e2a-11e6-8958-d150991a2d92
    label=""
    prune-min=00:00:00
    operating as a SLAVE
    snapshots directory defaults to /var/hammer/<pfs>
}
nyx# hammer pfs-status /Archive2Backup/pfs/movies
/Archive2Backup/pfs/movies      PFS #1 {
    sync-beg-tid=0x0000000000000001
    sync-end-tid=0x0000000108a74ba0
    shared-uuid=7bef50aa-5783-11e6-8627-d150991a2d92
    unique-uuid=1f30e13b-5784-11e6-8627-d150991a2d92
    label=""
    prune-min=00:00:00
    operating as a SLAVE
    snapshots directory defaults to /var/hammer/<pfs>
}
nyx# ls /Archive2Backup/@@0x0000000108a74d20:00001
.DS_Store
._.DS_Store
<snip>

nyx#
nyx# ls /Archive2Backup/@@0x0000000100058744:00002
ls: /Archive2Backup/@@0x0000000100058744:00002: No such file or directory

The symlink is updated as per hammer mirror-copy/hammer mirror-stream.

I simply cannot mount_null, list files or do anything on this PFS. Destroy/Recreate does nothing. 



-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here: http://bugs.dragonflybsd.org/my/account



More information about the Bugs mailing list