Bug #1806
closedDFBSD 2.7.3 - mbuf exhausted while rsync to a NFS
100%
Description
I got two virtual machines running DFBSD. One is KVM (512MB mem) and the other
one is under VMware (1024MB).
kvm is the NFS server which is exporting /usr like this:/usr -alldirs -maproot=root: -network ....
From the vmware I mount it, and start copying the repo using rsync:
# rsync -av -progress /usr/src /mnt/target/usr/
After a while the following warning appears in the kvm (NFS server):
Warning, objcache(mbuf): Exhausted!
# netstat -m 9056/9056 mbufs in use (current/max): 134/4528 mbuf clusters in use (current/max) 9190 mbufs and mbuf clusters allocated to data 2532 Kbytes allocated to network (22% of mb_map in use) 163 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines
In the client part the copy stops:
dfbsd/.git/objects/pack/pack-eb16b18282ea58f39f353cb1c7e4786cfa544159.pack 24084480 10% 4.10MB/s 0:00:48 rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32) rsync: write failed on "/mnt/remote/dfbsd/.git/objects/pack/pack-eb16b18282ea58f39f353cb1c7e4786cfa544159.pack": RPC struct is bad (72) rsync error: error in file IO (code 11) at receiver.c(302) [receiver=3.0.7] [sender] io timeout after 30 seconds -- exiting rsync error: timeout in data send/receive (code 30) at io.c(140) [sender=3.0.7] [vmware] /usr/src>
And I can't even ssh from outside the kvm machine:
% ssh 192.168.3.100 antonioh@192.168.3.100's password: Timeout, server not responding. %
Updated by dillon over 14 years ago
Ok, this should be fixed now. nfs_realign() was calling m_copyback() which was allocating the mbuf chain using normal mbufs instead of cluster bufs, causing the normal mbufs to get blown out on machines with low amounts of memory.
-Matt
Updated by tuxillo over 14 years ago
Hi Matt,
As we agreed, I've uploaded the dump files of the panic that was produced in the
NFS client side. They are in my home dir: ~/crash/1806*.1
Cheers,
Antonio Huete
Updated by tuxillo over 2 years ago
- Description updated (diff)
- Assignee deleted (
0)
Updated by tuxillo over 2 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
Unable to reproduce this issue, hence closing.
Evidence¶
SERVER
root@dev01:/usr/src # sysctl hw.physmem hw.physmem: 500301824 root@dev01:/usr/src # cat /etc/exports /usr -alldirs -maproot=root: -network 10.0.0.0/24
- Have been monitoring the mbuf usage, it's really low during the copy.
- No errors in dmesg.
CLIENT
root@dev03:~ # df -h /usr/src Filesystem Size Used Avail Capacity Mounted on 10.0.0.101:/usr/src 44.5G 11.0G 33.5G 25q /usr/src root@dev03:~ # rsync -aP --delete /usr/src /mnt/target/usr/ sending incremental file list root@dev03:~ # diff -urN /usr/src /mnt/target/usr/src load: 0.00 cmd: diff 798 [running] 0.08u 1.36s 7q 3916k root@dev03:~ #
- Repeated the copy multiple times.
- Even compared the directories with diff and rsync.