Bug #2610

Hammer mirror copy causes extreme slowdown on new ssh connections

Added by t_dfbsd 10 months ago. Updated 6 months ago.

Status:NewStart date:11/30/2013
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

I ssh into the machine to do a mirror copy or mirror stream from a master PFS on a SATA drive to a newly created slave PFS on a different SATA drive (both drives are encrypted). During the mirror copy everything seems fine and the SSH session I'm in seems responsive enough. CPU usage as reported by top doesn't exceed 40%.

If I try to make a new SSH session right after the mirror copy starts it works normally. However, a few minutes after the mirror copy starts, any attempt to SSH takes a vey long time to connect. It appears that other types of network connections, like SMB are similarly affected. There is no activity on the box other than the mirror copy or stream. If I kill that, then things return to normal.

dmesg.txt Magnifier (17.8 KB) t_dfbsd, 11/30/2013 12:42 PM

History

#1 Updated by justin 10 months ago

Does it change anything if you limit the bandwidth used? ( -b 512k or similar) A smaller splitsize?

Those are semi-random guesses on my part.

#2 Updated by t_dfbsd 10 months ago

I should have said that the mirroring is occurring between two drives on the same machine, so it doesn't sound like the -b option would apply. Does the setting of splitsize only affect the initialization phase of the mirroring? The problem I'm seeing continues to occur well after the mirror has started.

#3 Updated by justin 10 months ago

Hmm. I'm doing the exact same thing here on 3.4 (master -> slave, both drives in same machine) and not seeing the problem. I haven't modified anything from 'normal'. I don't have as many disks as you, though.

If you have the space for it - can you copy a large file, or large set of files, from one disk to the other? The problem happens a while after the hammer copy starts, so it's probably fine while the system figures out what to move in what parts, and then starts to slow down when the actual data transfer happens. If it's the data transit itself that causes the problem, a normal bulk copy from disk to disk should show the same symptoms.

Your SATA drives in your dmesg are coming up as daX devices - something that I thought would only happen with SCSI. Maybe I don't know what I'm talking about, though.

#4 Updated by t_dfbsd 10 months ago

Thanks for the suggestions. I tried copying an 8GB file and didn't have any problems while that was going on. Also, I don't recall ever having this issue on 3.4. This seems to have started with 3.6+.

#5 Updated by t_dfbsd 10 months ago

In case I wasn't completely clear, the issue is the time it takes to make the connection for a new ssh session. I timed one and it took over 5 minutes to get me to the shell prompt. Once I'm logged in, the new ssh connection responds normally.

#6 Updated by t_dfbsd 6 months ago

Just an update that this is still an issue on v3.7.1.800.g060fb-DEVELOPMENT. I was mirror copying a 4TB PFS from one SATA drive to another in the same machine. After running approximately 12 hours, I ran the command "hammer config" on a different volume and the SSH session I was in hung for over a minute. It eventually returned. While it was hung, I tried to open a new SSH session to the machine and that took over a minute to connect.

Also available in: Atom PDF