Project

General

Profile

Bug #952

Updated by ftigeot about 12 years ago

Hello. 

 Doing a backup to our DragonFly file server, using rsync, hangs the 
 server under certain conditions.   

 It only happens when I use the *--delete* option to rsync and seems to 
 only be on directories with large multi-GB files.    The directory that 
 consistently reproduces it is my tv recording directory where I have 
 multiple files ranging from 1 to 4 GB.    The total directory on the 
 server is about 67 GB, which has old files that need deleted.    The 
 directory on the client machine is currently about 79 GB.  

 While the server is hung, I can still ping it and switch virtual 
 consoles but, other than that, all consoles are just frozen.    While it 
 is frozen, if I hit <cntl>c I see the '^C' on the screen but get no 
 other response. 

 Running *top*, in another console -- top also freezes and stops 
 updating, always with zero or very little process load.    On this last 
 test, after killing rsync on the client side with <cntl>c, *top* briefly 
 updated after 3 minutes, then stayed frozen for 3.5 more minutes.    After 
 a total of 6.5 minutes, the server came alive again. 

 It does not seem to be related to ssh because I ran the rsync daemon on 
 the server and ran the same test without ssh and got the same results. 
 Here is the output of the last test on the client side. 

   $ rsync -HOav -x --delete . alexandria::tv/recordings 
   /tv/recordings 
   building file list ... done 
   deleting The Universe (Jupiter: The Giant Planet).info 

 Hangs here.    I waited a while and finally hit <cntl>c. 

   ^Crsync error: received SIGUSR1 or SIGINT (code 20) at rsync.c(163) 

 The .info file and the .avi file were both gone on the server after that 
 but I am not sure if the .avi file was deleted on one of the other 
 tests. 

 I did get a couple entries in /var/log/messages on the server with the 
 following error when I ran it with the rsync daemon 

   rsyncd[3933]: rsync error: error in rsync protocol data stream (code 12) at io.c(453) [receiver=2.6.9] 

 I did not get that error when run under ssh so I don't know if it has to 
 do with the freezing problem. 

 On the server, I replaced the recording directory with one that had 
 a subset of the files, about 5 recordings that needed deleted, and it 
 worked fine.    Then I hard linked all files in the directory from another 
 directory, and it still worked fine. 

 i.e. 
     mv recordings recordings.bak 
     mkdir recordings 
     ln recordings.bak/* recordings 

 So far as I can tell, it only happens if it is over a certain amount of 
 data in the directory and it has to actually delete the files, not just  
 unlink a secondary hard link. 

 I have been able to backup just about every other directory on our 
 client machines without any problems. 

 Does anybody have any theories about what might be happening? 

 The client machine is a NetBSD machine. 
 The DragonFly server is running version 1.10.1-RELEASE.    I also 
 tested with a 1.11.0-DEVELOPMENT kernel and got the same results.

Back