Bug #952
Updated by ftigeot almost 13 years ago
Hello. Doing a backup to our DragonFly file server, using rsync, hangs the server under certain conditions. It only happens when I use the *--delete* option to rsync and seems to only be on directories with large multi-GB files. The directory that consistently reproduces it is my tv recording directory where I have multiple files ranging from 1 to 4 GB. The total directory on the server is about 67 GB, which has old files that need deleted. The directory on the client machine is currently about 79 GB. While the server is hung, I can still ping it and switch virtual consoles but, other than that, all consoles are just frozen. While it is frozen, if I hit <cntl>c I see the '^C' on the screen but get no other response. Running *top*, in another console -- top also freezes and stops updating, always with zero or very little process load. On this last test, after killing rsync on the client side with <cntl>c, *top* briefly updated after 3 minutes, then stayed frozen for 3.5 more minutes. After a total of 6.5 minutes, the server came alive again. It does not seem to be related to ssh because I ran the rsync daemon on the server and ran the same test without ssh and got the same results. Here is the output of the last test on the client side. $ rsync -HOav -x --delete . alexandria::tv/recordings /tv/recordings building file list ... done deleting The Universe (Jupiter: The Giant Planet).info Hangs here. I waited a while and finally hit <cntl>c. ^Crsync error: received SIGUSR1 or SIGINT (code 20) at rsync.c(163) The .info file and the .avi file were both gone on the server after that but I am not sure if the .avi file was deleted on one of the other tests. I did get a couple entries in /var/log/messages on the server with the following error when I ran it with the rsync daemon rsyncd[3933]: rsync error: error in rsync protocol data stream (code 12) at io.c(453) [receiver=2.6.9] I did not get that error when run under ssh so I don't know if it has to do with the freezing problem. On the server, I replaced the recording directory with one that had a subset of the files, about 5 recordings that needed deleted, and it worked fine. Then I hard linked all files in the directory from another directory, and it still worked fine. i.e. mv recordings recordings.bak mkdir recordings ln recordings.bak/* recordings So far as I can tell, it only happens if it is over a certain amount of data in the directory and it has to actually delete the files, not just unlink a secondary hard link. I have been able to backup just about every other directory on our client machines without any problems. Does anybody have any theories about what might be happening? The client machine is a NetBSD machine. The DragonFly server is running version 1.10.1-RELEASE. I also tested with a 1.11.0-DEVELOPMENT kernel and got the same results.