Project

General

Profile

Bug #2962

Hammer PFS Slave has broken symbolic link, recreating it doesn't work

Added by benjolitz 5 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
VFS subsystem
Target version:
-
Start date:
10/29/2016
Due date:
% Done:

0%


Description

I setup mirroring as described in this document - https://www.dragonflybsd.org/docs/how_to_implement_hammer_pseudo_file_system__40___pfs___41___slave_mirroring_from_pfs_master/

I've experienced several power failures and I've noticed that one of my mirrored backups no longer works for resolving it's symbolic link.

If I destroy the errored pfs (tv) and recreate it via hammer pfs-slave, the symbolic link still doesn't work.

Console output:

nyx# file /Archive2Backup/pfs/*
/Archive2Backup/pfs/movies: symbolic link to @@0x0000000108a74b20:00001
/Archive2Backup/pfs/tv: broken symbolic link to @@0x0000000100058744:00002
nyx# hammer pfs-status /Archive2Backup/pfs/tv
/Archive2Backup/pfs/tv PFS #2 {
sync-beg-tid=0x0000000000000001
sync-end-tid=0x0000000100058744
shared-uuid=816cf516-5783-11e6-8627-d150991a2d92
unique-uuid=67a8792c-9e2a-11e6-8958-d150991a2d92
label=""
prune-min=00:00:00
operating as a SLAVE
snapshots directory defaults to /var/hammer/<pfs>
}
nyx# hammer pfs-status /Archive2Backup/pfs/movies
/Archive2Backup/pfs/movies PFS #1 {
sync-beg-tid=0x0000000000000001
sync-end-tid=0x0000000108a74ba0
shared-uuid=7bef50aa-5783-11e6-8627-d150991a2d92
unique-uuid=1f30e13b-5784-11e6-8627-d150991a2d92
label=""
prune-min=00:00:00
operating as a SLAVE
snapshots directory defaults to /var/hammer/<pfs>
}
nyx# ls /Archive2Backup/@@0x0000000108a74d20:00001
.DS_Store
._.DS_Store
<snip>

nyx#
nyx# ls /Archive2Backup/@@0x0000000100058744:00002
ls: /Archive2Backup/@@0x0000000100058744:00002: No such file or directory

The symlink is updated as per hammer mirror-copy/hammer mirror-stream.

I simply cannot mount_null, list files or do anything on this PFS. Destroy/Recreate does nothing.

History

#1 Updated by benjolitz 5 months ago

After discussing with Matt Dillon on the IRC channel, I've decided to copy the master's files to an adjacent drive, then dump the master/slave metadatas and finally destroy/recreate both master and slave HAMMER filesystems.

Results of the metadata will be posted here when my backup is complete.

#2 Updated by benjolitz 5 months ago

  • Status changed from New to Resolved

After the IRC discussion, I noticed that the session for recreating the PFS did not complete as I expected. When it did, it had recreated the faulted PFS correctly.

Therefore I consider this issue closed given that:

1) A Faulted PFS may be deleted using pfs-destroy
2) It may be created using mirror-copy.

While that operation was pending, I made another backup of the master PFS to another drive entirely, also running HAMMER.

To my dismay, attempts to delete the backup directory (2.1TiB) caused a denial of service on the system. A hard reset only caused HAMMER to attempt to rollback the transaction log and caused yet another denial of service.

After giving it 6 hours to attempt to mount via `mount_hammer` (Control-T revealed it was in `nbufs`), I gave up, booted into single user mode, and disabled the master/slave drives for that HAMMER filesystem.

Any attempts to mount the DoS'ing HAMMER filesystem were ineffective due to it's insistence on reverting to a prior transaction using it's undo log.

I've since switched to the backup (slave) and made it the master and mounted it in the correct place.

The now-faulted HAMMER master (with 2.1TiB pending deletions of history) remains unmounted.

If there's any advice or interest in figuring out why a `sudo rm -rf /Archive1/backup` caused a denial of service, I'm happy to conduct any activities on it for one week from this update (ending on 11/5/2016).

After that, I will reformat the faulted master and set it up as a new mirrored-slave of the recently promoted slave-turned-master.

No data has been lost due to the master-slave mirroring of HAMMER, however this experience would have been catastrophic if I had conducted a very large deletion sweep on a HAMMER partition (no explicit PFS used to hold the errant data).

Lessons learned:

1. If you have a faulted PFS, destroy it and recreate it. Wait for any mirroring to be complete as the symlink to it will remain invalid until the mirroring is complete.

2. If you make a multi Terabyte copy of a HAMMER master to another HAMMER master with default snapshot/history configuration, do not attempt to delete it all at once. Suggest deleting in sweeps of a few gigabytes and increasing until system latency is noticeably implacted.

3. If you are storing vast quantities of data on HAMMER, ensure your snapshot/history configuration is sensible. Mine was using the defaults, which now seems questionable.

4. Mirror-mirror-mirror your data.

5. If you find yourself unable to boot due to HAMMER redo on a NON-ROOT HAMMER filesystem, use the boot menu to launch single user mode, mount /var, remount root as read-write and use `vi` or any other editor to comment out the guilty filesystem so you may get a working environment.

6. Do not take this as an indictment of HAMMER but a "stupid user" story wherein I provoked a pathological case while incorrectly assuming my earlier efforts to remirror the PFS were ineffective.

Ironically, the original offender (Archive2) and it's backup (Archive2Backup) were restored completely and have no issues, while my worst-case guess of backing up the master and deleting an now-no-longer-needed copy of the file hierarchy ended up causing an even bigger problem.

#3 Updated by benjolitz 5 months ago

Apparently I forgot the rest of the sentence:

"No data has been lost due to the master-slave mirroring of HAMMER, however this experience would have been catastrophic if I had conducted a very large deletion sweep on a HAMMER partition (no explicit PFS used to hold the errant data)."

should be completed with:

"on a HAMMER partition without a mirror (i.e. the root)".

#4 Updated by tkusumi 5 months ago

What version (uname -r) is this ?

#5 Updated by benjolitz 5 months ago

[ben@nyx ~]$ uname -r
4.4-RELEASE

I was in the process of upgrading to 4.6 on the first stage (make buildworld) but stopped when I discovered the problems.

Quite happy to resume upgrading (up to and including master) if that is adviseable.

Also available in: Atom PDF