Bug #1726
closedtmpfs "malloc limit exceeded" panic
0%
Description
When running fsstress on a tmpfs /tmp filesystem, I got a panic 'tmpfs node:
malloc limit exceeded' on DragonFly 2.6.0.43.gfa1ae-RELEASE.
Kernel at http://endeavour.zapto.org/dfly/tmpfs_20100413_panic/kern.0.gz, vmcore
at http://endeavour.zapto.org/dfly/tmpfs_20100413_panic/vmcore.0.gz
Updated by tuxillo over 14 years ago
Venk,
Both kern and vmcore are unaccessible:
The requested URL /dfly/tmpfs_20100413_panic/vmcore.0.gz was not found on this
server.
Cheers,
Antonio Huete
Updated by vsrinivas over 14 years ago
Hey, sorry I removed them from lack of space.
http://endeavour.zapto.org/src/tmpfskern.0.gz
and
http://endeavour.zapto.org/src/tmpfsvmcore.0.gz
exhibit the same problem, on a kernel built from git today. These dumps are from
a system with 64M of ram, but they also appear on a 1GB of ram machine.
-- vs
Updated by vsrinivas over 14 years ago
I believe
http://gitweb.dragonflybsd.org/dragonfly.git/commit/42f6f6b1b2dcc2ca10d31421d2dd
6273851e012d,
http://gitweb.dragonflybsd.org/dragonfly.git/commit/8e771504ede4fe826607300e9e4c
0c7444652cc4, and
http://gitweb.dragonflybsd.org/dragonfly.git/commit/dcaa8a41662f2b0cf579a6e91256
4c9fc8275ac1 allow it to survive this. I've tested on a vkernel only, would
appreciate testing on real hardware.
Updated by vsrinivas over 14 years ago
While we will survive fsstress, since the name zone hits its limit before the
dirent one under fsstress, the basic problem (that limits can be reached and
tmpfs wasn't counting resources from each zone) remains for dirent structures. A
well-written test that makes many symlinks while minimizing name zone usage
would still panic the kernel.
Before I close this bug, it'd be nice if more people could confirm/deny that
fsstress can run on tmpfs.
Things to do to make the world better:
1) Move the tmpfs name zone from a global malloc zone to a per-mount zone
2) Convert dirent allocations from M_WAITOK to M_WAITOK | M_NULLOK and handle
the null return case; there are only two places that dirents are allocated, so
this wouldn't be too bad.
Updated by thomas.nikolajsen over 14 years ago
I did test and it eventually panic'ed on mem exhaust,
would you like dump?
Test was on SMP kernel on 2GB RAM system;
on slower system, UP, 2GB RAM, I had no panic on running over night (too
slow?).
Also ran fsx; both programs w/ params. in do* file with program.
On UP system I saw message on console on shutdown:
Warning: deep namecache recursion at (null)
don't know if this happened during run or at shutdown;
fs test programs were stopped (^Z) at shutdown.
Updated by ahuete.devel over 14 years ago
Venk,
I could panic the kernel fairly easy on a 256MB VM. It hit the malloc
limit in about 2 minutes running fsstress:
http://www.imgpaste.com/i/ilplf.jpg
Cheers,
Antonio Huete
Updated by ahuete.devel over 14 years ago
Hi,
Mount command was:
- sudo mount -t tmpfs tmpfs /mnt/tmpfs/
- vmstat -m | grep tmpfs
tmpfs node 1 1K 0K 30924K 1 0 0
tmpfs mount 1 1K 0K 24830K 1 0 0
Cheers,
Antonio Huete
Updated by vsrinivas over 14 years ago
Hi,
I just committed 881dac8bcf7f6e26635fa38f071b93347ef92192, which I think solves
the problem tuxillo hit. I'd love if people tried it out - last time I thought
tmpfs was solved, it wasn't :D.
The fix allows the malloc zone for nodes to return NULL when its limit is
exhausted or when we are unable to satisfy the malloc (I've seen that on some
low-memory systems here); tmpfs_node_init would not survive a NULL node as well,
which I just fixed.
-- vs
Updated by vsrinivas over 14 years ago
thomas - the deep namecache recursions 'should' be happening on shutdown
(actually unmount); they are a real problem - a full tmpfs (by nodes) on a
system with 1.5GB of RAM takes upwards of 5 min to unmount. Why the ncp->nc_name
field is empty I don't know either....
status:
- tmpfs should survive fsstress at the minute
- >1 tmpfs will be a problem, the name zone is shared
- there are workloads which will still panic it...
- unmount after fsstress takes a long time
stuff to do still standing:
- Move the tmpfs name zone from a global malloc zone to a per-mount zone
- Convert dirent allocations from M_WAITOK to M_WAITOK | M_NULLOK and handle
the null return case; there are only two places that dirents are allocated, so
this wouldn't be too bad.
- Figure out why the unmount is hitting so many namecache entries with null
names
- Write a link stress test (something that makes a lot of links) to see if we
can exhaust the dirent zone currently.
Updated by tuxillo over 14 years ago
Venk,
After 3h testing w/ fsstress I didn't have any panics, but on shutdown the
namecache recursion issue is still there.
Cheers,
Antonio Huete
Updated by vsrinivas over 14 years ago
Tux: That's good to hear!
This test program:
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
main() {
int i;
char id320 = {};
for (i = 0; i < 10000000; i++) {
sprintf(id, "%09d", i);
link("sin.c", id);
}
return 0;
}
----
I expected it to exhaust the tmpfs dirent zone on a low-memory system, where the
dirent zone limit was less than the system limit on hardlinks. Instead I
exhausted the vfscache zone:
panic: vfscache: malloc limit exceeded
mp_lock = 00000000; cpuid = 0
Trace beginning at frame 0x54ee5a10
panic(ffffffff,54ee5a38,55492c08,82d43e0,40400840) at 0x80e1d33
panic(8287563,829176f,4c8b7339,0,55492c08) at 0x80e1d33
kmalloc(a,82d43e0,2,0,54ee5bec) at 0x80df67c
cache_unlock(0,0,52b48d00,52ba4b00,40400000) at 0x812c274
cache_nlookup(54ee5bec,54ee5af4,54ee5bec,54ee5bec,40400000) at 0x81302ed
nlookup(54ee5bec,5503e4c8,54ee5c24,52a45540,5503e4c8) at 0x8138665
kern_link(54ee5c24,54ee5bec,552881d8,52ba4b00,526dc698) at 0x8141aa6
sys_link(54ee5c94,0,0,82c46cc,292) at 0x8147475
syscall2(54ee5d40,52a1dd40,0,0,54ee5d38) at 0x8265d6d
user_trap(54ee5d40,54e8bb88,82667bd,0,0) at 0x82660af
go_user(54ee5d38,0,0,7b,0) at 0x826663e
Debugger("panic")
CPU0 stopping CPUs: 0x00000000
stopped
Stopped at 0x826352d: movb $0,0x83f6194
db>
(hardlinks are the one of two things in tmpfs that allocate dirents; the other
allocation is already bounded by the node limits, so its not a problem).
Updated by eocallaghan over 14 years ago
I was able to reproduce with a hammer equivalent with the below test case from
vsrinivas.
(kgdb) bt
#0 _get_mycpu (di=0xc06d4ca0) at ./machine/thread.h:83
#1 md_dumpsys (di=0xc06d4ca0)
at /usr/src/sys/platform/pc32/i386/dump_machdep.c:263
#2 0xc0304d15 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:880
#3 0xc03052d5 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xc030559e in panic (fmt=0xc05bb41b "%s: malloc limit exceeded")
at /usr/src/sys/kern/kern_shutdown.c:786
#5 0xc03032bb in kmalloc (size=25, type=0xc1d8f590, flags=258)
at /usr/src/sys/kern/kern_slaballoc.c:503
#6 0xc04aa5a3 in hammer_alloc_mem_record (ip=0xcb803d50, data_len=25)
at /usr/src/sys/vfs/hammer/hammer_object.c:280
#7 0xc04aa91f in hammer_ip_add_directory (trans=0xce350ad4,
dip=0xcb803d50, name=0xd3cdb1d0 "000452457", bytes=9, ip=0xce31df50)
at /usr/src/sys/vfs/hammer/hammer_object.c:666
#8 0xc04bbf8a in hammer_vop_nlink (ap=0xce350b2c)
at /usr/src/sys/vfs/hammer/hammer_vnops.c:1388
#9 0xc036cc1f in vop_nlink_ap (ap=0xce350b2c)
at /usr/src/sys/kern/vfs_vopops.c:1978
#10 0xc03717ca in null_nlink (ap=0xce350b2c)
at /usr/src/sys/vfs/nullfs/null_vnops.c:164
#11 0xc036d465 in vop_nlink (ops=0xcdbbe030, nch=0xce350c48,
dvp=0xce0913e8, vp=0xce2f04e8, cred=0xcdef1738)
at /usr/src/sys/kern/vfs_vopops.c:1397
---Type <return> to continue, or q <return> to quit---
---Type <return> to continue, or q <return> to quit---#12 0xc0365496 in
kern_link (nd=0xce350c80, linknd=0xce350c48)
at /usr/src/sys/kern/vfs_syscalls.c:2320
#13 0xc036ad49 in sys_link (uap=0xce350cf0)
at /usr/src/sys/kern/vfs_syscalls.c:2345
#14 0xc055f6b3 in syscall2 (frame=0xce350d40)
at /usr/src/sys/platform/pc32/i386/trap.c:1310
#15 0xc0547fb6 in Xint0x80_syscall ()
at /usr/src/sys/platform/pc32/i386/exception.s:876
#16 0x0000001f in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(kgdb)
Cheers,
Edward.
Updated by vsrinivas over 14 years ago
Hi,
I just converted the tmpfs name zone from a systemwide zone to a per-mount zone.
All of the panics from tmpfs directly from zone exhaustion should be taken care
of, so I think it'd be worth closing this bug and opening two new ones, one for
the vfscache exhaustion, one for the deep recursion on unmount.
I'll mark this as testing till more people can try beating up on tmpfs?
-- vs
Updated by vsrinivas about 14 years ago
Per last testing from tuxillo, this seems to be finally resolved! Cheers!
[The namecache recursion on unmount and the vfscache limit bugs still exist, as
a warning.]
-- vs