Bug #1726

tmpfs "malloc limit exceeded" panic

Added by vsrinivas over 4 years ago. Updated about 4 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

When running fsstress on a tmpfs /tmp filesystem, I got a panic 'tmpfs node:
malloc limit exceeded' on DragonFly 2.6.0.43.gfa1ae-RELEASE.

Kernel at http://endeavour.zapto.org/dfly/tmpfs_20100413_panic/kern.0.gz, vmcore
at http://endeavour.zapto.org/dfly/tmpfs_20100413_panic/vmcore.0.gz

History

#1 Updated by tuxillo about 4 years ago

Venk,

Both kern and vmcore are unaccessible:

The requested URL /dfly/tmpfs_20100413_panic/vmcore.0.gz was not found on this
server.

Cheers,
Antonio Huete

#2 Updated by vsrinivas about 4 years ago

Hey, sorry I removed them from lack of space.

http://endeavour.zapto.org/src/tmpfskern.0.gz
and
http://endeavour.zapto.org/src/tmpfsvmcore.0.gz

exhibit the same problem, on a kernel built from git today. These dumps are from
a system with 64M of ram, but they also appear on a 1GB of ram machine.

-- vs

#3 Updated by vsrinivas about 4 years ago

#4 Updated by vsrinivas about 4 years ago

While we will survive fsstress, since the name zone hits its limit before the
dirent one under fsstress, the basic problem (that limits can be reached and
tmpfs wasn't counting resources from each zone) remains for dirent structures. A
well-written test that makes many symlinks while minimizing name zone usage
would still panic the kernel.

Before I close this bug, it'd be nice if more people could confirm/deny that
fsstress can run on tmpfs.

Things to do to make the world better:
1) Move the tmpfs name zone from a global malloc zone to a per-mount zone
2) Convert dirent allocations from M_WAITOK to M_WAITOK | M_NULLOK and handle
the null return case; there are only two places that dirents are allocated, so
this wouldn't be too bad.

#5 Updated by thomas.nikolajsen about 4 years ago

I did test and it eventually panic'ed on mem exhaust,
would you like dump?

Test was on SMP kernel on 2GB RAM system;
on slower system, UP, 2GB RAM, I had no panic on running over night (too
slow?).
Also ran fsx; both programs w/ params. in do* file with program.

On UP system I saw message on console on shutdown:
Warning: deep namecache recursion at (null)
don't know if this happened during run or at shutdown;
fs test programs were stopped (^Z) at shutdown.

#6 Updated by ahuete.devel about 4 years ago

Venk,

I could panic the kernel fairly easy on a 256MB VM. It hit the malloc
limit in about 2 minutes running fsstress:

http://www.imgpaste.com/i/ilplf.jpg

Cheers,
Antonio Huete

#7 Updated by ahuete.devel about 4 years ago

Hi,

Mount command was:

# sudo mount -t tmpfs tmpfs /mnt/tmpfs/
# vmstat -m | grep tmpfs
tmpfs node 1 1K 0K 30924K 1 0 0
tmpfs mount 1 1K 0K 24830K 1 0 0

Cheers,
Antonio Huete

#8 Updated by vsrinivas about 4 years ago

Hi,

I just committed 881dac8bcf7f6e26635fa38f071b93347ef92192, which I think solves
the problem tuxillo hit. I'd love if people tried it out - last time I thought
tmpfs was solved, it wasn't :D.

The fix allows the malloc zone for nodes to return NULL when its limit is
exhausted or when we are unable to satisfy the malloc (I've seen that on some
low-memory systems here); tmpfs_node_init would not survive a NULL node as well,
which I just fixed.

-- vs

#9 Updated by vsrinivas about 4 years ago

thomas - the deep namecache recursions 'should' be happening on shutdown
(actually unmount); they are a real problem - a full tmpfs (by nodes) on a
system with 1.5GB of RAM takes upwards of 5 min to unmount. Why the ncp->nc_name
field is empty I don't know either....

status:
- tmpfs should survive fsstress at the minute
- >1 tmpfs will be a problem, the name zone is shared
- there are workloads which will still panic it...
- unmount after fsstress takes a long time

stuff to do still standing:
- Move the tmpfs name zone from a global malloc zone to a per-mount zone
- Convert dirent allocations from M_WAITOK to M_WAITOK | M_NULLOK and handle
the null return case; there are only two places that dirents are allocated, so
this wouldn't be too bad.
- Figure out why the unmount is hitting so many namecache entries with null
names
- Write a link stress test (something that makes a _lot_ of links) to see if we
can exhaust the dirent zone currently.

#10 Updated by tuxillo about 4 years ago

Venk,

After 3h testing w/ fsstress I didn't have any panics, but on shutdown the
namecache recursion issue is still there.

Cheers,
Antonio Huete

#11 Updated by vsrinivas about 4 years ago

Tux: That's good to hear!

----

This test program:

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

main() {
int i;
char id[320] = {};

for (i = 0; i < 10000000; i++) {
sprintf(id, "%09d", i);
link("sin.c", id);
}

return 0;
}
----
I expected it to exhaust the tmpfs dirent zone on a low-memory system, where the
dirent zone limit was less than the system limit on hardlinks. Instead I
exhausted the vfscache zone:

panic: vfscache: malloc limit exceeded
mp_lock = 00000000; cpuid = 0
Trace beginning at frame 0x54ee5a10
panic(ffffffff,54ee5a38,55492c08,82d43e0,40400840) at 0x80e1d33
panic(8287563,829176f,4c8b7339,0,55492c08) at 0x80e1d33
kmalloc(a,82d43e0,2,0,54ee5bec) at 0x80df67c
cache_unlock(0,0,52b48d00,52ba4b00,40400000) at 0x812c274
cache_nlookup(54ee5bec,54ee5af4,54ee5bec,54ee5bec,40400000) at 0x81302ed
nlookup(54ee5bec,5503e4c8,54ee5c24,52a45540,5503e4c8) at 0x8138665
kern_link(54ee5c24,54ee5bec,552881d8,52ba4b00,526dc698) at 0x8141aa6
sys_link(54ee5c94,0,0,82c46cc,292) at 0x8147475
syscall2(54ee5d40,52a1dd40,0,0,54ee5d38) at 0x8265d6d
user_trap(54ee5d40,54e8bb88,82667bd,0,0) at 0x82660af
go_user(54ee5d38,0,0,7b,0) at 0x826663e
Debugger("panic")

CPU0 stopping CPUs: 0x00000000
stopped
Stopped at 0x826352d: movb $0,0x83f6194
db>

(hardlinks are the one of two things in tmpfs that allocate dirents; the other
allocation is already bounded by the node limits, so its not a problem).

#12 Updated by eocallaghan about 4 years ago

I was able to reproduce with a hammer equivalent with the below test case from
vsrinivas.

(kgdb) bt
#0 _get_mycpu (di=0xc06d4ca0) at ./machine/thread.h:83
#1 md_dumpsys (di=0xc06d4ca0)
at /usr/src/sys/platform/pc32/i386/dump_machdep.c:263
#2 0xc0304d15 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:880
#3 0xc03052d5 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xc030559e in panic (fmt=0xc05bb41b "%s: malloc limit exceeded")
at /usr/src/sys/kern/kern_shutdown.c:786
#5 0xc03032bb in kmalloc (size=25, type=0xc1d8f590, flags=258)
at /usr/src/sys/kern/kern_slaballoc.c:503
#6 0xc04aa5a3 in hammer_alloc_mem_record (ip=0xcb803d50, data_len=25)
at /usr/src/sys/vfs/hammer/hammer_object.c:280
#7 0xc04aa91f in hammer_ip_add_directory (trans=0xce350ad4,
dip=0xcb803d50, name=0xd3cdb1d0 "000452457", bytes=9, ip=0xce31df50)
at /usr/src/sys/vfs/hammer/hammer_object.c:666
#8 0xc04bbf8a in hammer_vop_nlink (ap=0xce350b2c)
at /usr/src/sys/vfs/hammer/hammer_vnops.c:1388
#9 0xc036cc1f in vop_nlink_ap (ap=0xce350b2c)
at /usr/src/sys/kern/vfs_vopops.c:1978
#10 0xc03717ca in null_nlink (ap=0xce350b2c)
at /usr/src/sys/vfs/nullfs/null_vnops.c:164
#11 0xc036d465 in vop_nlink (ops=0xcdbbe030, nch=0xce350c48,
dvp=0xce0913e8, vp=0xce2f04e8, cred=0xcdef1738)
at /usr/src/sys/kern/vfs_vopops.c:1397
---Type <return> to continue, or q <return> to quit---
---Type <return> to continue, or q <return> to quit---#12 0xc0365496 in
kern_link (nd=0xce350c80, linknd=0xce350c48)
at /usr/src/sys/kern/vfs_syscalls.c:2320
#13 0xc036ad49 in sys_link (uap=0xce350cf0)
at /usr/src/sys/kern/vfs_syscalls.c:2345
#14 0xc055f6b3 in syscall2 (frame=0xce350d40)
at /usr/src/sys/platform/pc32/i386/trap.c:1310
#15 0xc0547fb6 in Xint0x80_syscall ()
at /usr/src/sys/platform/pc32/i386/exception.s:876
#16 0x0000001f in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(kgdb)

Cheers,
Edward.

#13 Updated by vsrinivas about 4 years ago

Hi,

I just converted the tmpfs name zone from a systemwide zone to a per-mount zone.
All of the panics from tmpfs directly from zone exhaustion should be taken care
of, so I think it'd be worth closing this bug and opening two new ones, one for
the vfscache exhaustion, one for the deep recursion on unmount.

I'll mark this as testing till more people can try beating up on tmpfs?

-- vs

#14 Updated by vsrinivas about 4 years ago

Per last testing from tuxillo, this seems to be finally resolved! Cheers!

[The namecache recursion on unmount and the vfscache limit bugs still exist, as
a warning.]

-- vs

Also available in: Atom PDF