Project

General

Profile

Actions

Bug #1806

closed

DFBSD 2.7.3 - mbuf exhausted while rsync to a NFS

Added by tuxillo almost 12 years ago. Updated 9 days ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:

Description

I got two virtual machines running DFBSD. One is KVM (512MB mem) and the other
one is under VMware (1024MB).

kvm is the NFS server which is exporting /usr like this:
/usr -alldirs -maproot=root: -network ....

From the vmware I mount it, and start copying the repo using rsync:

# rsync -av -progress /usr/src /mnt/target/usr/

After a while the following warning appears in the kvm (NFS server):
Warning, objcache(mbuf): Exhausted!

# netstat -m
9056/9056 mbufs in use (current/max):
134/4528 mbuf clusters in use (current/max)
          9190 mbufs and mbuf clusters allocated to data
2532 Kbytes allocated to network (22% of mb_map in use)
163 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

In the client part the copy stops:

dfbsd/.git/objects/pack/pack-eb16b18282ea58f39f353cb1c7e4786cfa544159.pack
    24084480  10%    4.10MB/s    0:00:48
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken
pipe (32)
rsync: write failed on
"/mnt/remote/dfbsd/.git/objects/pack/pack-eb16b18282ea58f39f353cb1c7e4786cfa544159.pack":
RPC struct is bad (72)
rsync error: error in file IO (code 11) at receiver.c(302) [receiver=3.0.7]
[sender] io timeout after 30 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(140) [sender=3.0.7]
[vmware] /usr/src>

And I can't even ssh from outside the kvm machine:

% ssh 192.168.3.100
antonioh@192.168.3.100's password: 
Timeout, server not responding.
%

Actions #1

Updated by dillon almost 12 years ago

Ok, this should be fixed now. nfs_realign() was calling m_copyback() which was allocating the mbuf chain using normal mbufs instead of cluster bufs, causing the normal mbufs to get blown out on machines with low amounts of memory.

-Matt

Actions #2

Updated by tuxillo almost 12 years ago

Hi Matt,

As we agreed, I've uploaded the dump files of the panic that was produced in the
NFS client side. They are in my home dir: ~/crash/1806*.1

Cheers,
Antonio Huete

Actions #3

Updated by tuxillo 9 days ago

  • Description updated (diff)
  • Assignee deleted (0)
Actions #4

Updated by tuxillo 9 days ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

Unable to reproduce this issue, hence closing.

Evidence

SERVER

root@dev01:/usr/src # sysctl hw.physmem 
hw.physmem: 500301824

root@dev01:/usr/src # cat /etc/exports  
/usr -alldirs -maproot=root: -network 10.0.0.0/24

  • Have been monitoring the mbuf usage, it's really low during the copy.
  • No errors in dmesg.

CLIENT

root@dev03:~ # df -h /usr/src 
Filesystem            Size   Used  Avail Capacity  Mounted on
10.0.0.101:/usr/src  44.5G  11.0G  33.5G    25q    /usr/src

root@dev03:~ # rsync -aP --delete /usr/src /mnt/target/usr/ 
sending incremental file list

root@dev03:~ # diff -urN /usr/src /mnt/target/usr/src
load: 0.00  cmd: diff 798 [running] 0.08u 1.36s 7q 3916k
root@dev03:~ #  

  • Repeated the copy multiple times.
  • Even compared the directories with diff and rsync.
Actions

Also available in: Atom PDF