Bug #2793
closedobjcache exhausted - hard lock
0%
Description
I have relatively fresh dragonflybsd system:
DragonFly v4.1.0.632.g2d4b0-DEVELOPMENT (GPR_DF) #0: Mon Feb 2 14:28:49 MSK 2015
It is run inside some virtualization software (I suppose it is qemu, but I'm not sure)
It has 512MB RAM, 160GB hammer partition, vtnet network interface.
Under memory pressure (buildworld after gcc5 import, or building something like
ports/devel/boost-libs) it hangs, printing the following message on console:
"Warning, objcache(mbuf pkt hdr + cluster): Exhausted!"
And after that system is hard locked. Console doesn't react, network is unaccessible.
If I turn off network (ifconfig vtnet0 down), system hangs the same way, but
without that "objcache exhausted" message, so I don't think it is network issue.
I suppose some kernel memory pool is exhausted (the same pool from which "mbuf
pkt hdr + cluster" takes memory).
"netstat -n" just before hang:
Wed Feb 11 15:34:55 MSK 2015
2/18048 mbufs in use (current/max):
256/1440 mbuf clusters in use (current/max)
0/720 mbuf jumbo clusters in use (current/max)
258 mbufs and mbuf clusters allocated to data
513 Kbytes allocated to network (4% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
Files
Updated by gpr almost 10 years ago
"ps axl" just before hang:
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
0 1 0 0 -12 -64 0 0 - B0L ?? 24:55.85 (idle_0)
0 -1 0 0 -12 -64 0 0 syncer B0L ?? 0:00.06 (syncer4)
0 -1 0 0 -12 -64 0 0 waiting B0L ?? 0:00.00 (consttyd)
0 -1 0 0 -12 -64 0 0 vlruwt B0L ?? 0:00.41 (vnlru)
0 -1 0 0 -12 -64 0 0 psleep B0L ?? 0:00.07 (bufdaemon)
0 -1 0 0 -12 -64 0 0 psleep B0L ?? 0:03.51 (bufdaemon_hw)
0 -1 0 0 -1 -53 0 0 pgzero B0L ?? 0:05.10 (pagezero)
0 -1 0 0 -12 -64 0 0 psleep B0L ?? 0:00.00 (vmdaemon)
0 -1 0 0 -12 -64 0 0 csleep B0L ?? 0:00.01 (swapcached)
0 -1 0 0 -12 -64 0 0 psleep B0L ?? 0:24.59 (pagedaemon)
0 -1 0 0 -12 -64 0 0 hmrssw B0L ?? 0:01.40 (hammer-S3)
0 -1 0 0 -12 -64 0 0 hmrssw B0L ?? 0:01.19 (hammer-S2)
0 -1 0 0 -12 -64 0 0 hmrssw B0L ?? 0:01.21 (hammer-S1)
0 -1 0 0 -12 -64 0 0 hmrssw B0L ?? 0:00.97 (hammer-S0)
0 -1 0 0 -12 -64 0 0 hmrwwa B0L ?? 0:03.21 (hammer-M)
0 -1 0 0 -12 -64 0 0 syncer B0L ?? 0:00.91 (syncer3)
0 -1 0 0 -12 -64 0 0 rwait B0L ?? 0:01.54 (random)
0 -1 0 0 -12 -64 0 0 syncer B0L ?? 0:00.01 (syncer2)
0 -1 0 0 -12 -64 0 0 syncer B0L ?? 0:00.10 (syncer1)
0 -1 0 0 -28 -80 0 0 - B0L ?? 0:00.00 (ithread1 0)
0 -1 0 0 -28 -80 0 0 - B0L ?? 0:20.95 (ithread11 0)
0 -1 0 0 -28 -80 0 0 - B0L ?? 0:00.02 (ithread15 0)
0 -1 0 0 -28 -80 0 0 - B0L ?? 0:00.00 (ithread14 0)
0 -1 0 0 -28 -80 0 0 - B0L ?? 0:00.00 (ithread9 0)
0 -1 0 0 -12 -64 0 0 waitport B0L ?? 0:00.00 (acpi_task)
0 -1 0 0 -14 -66 0 0 - B0L ?? 0:00.00 (ithread195 0)
0 -1 0 0 -12 -64 0 0 ccb_scan B0L ?? 0:00.00 (xpt_thrd)
0 -1 0 0 -14 -66 0 0 - B0L ?? 0:00.00 (ithread197 0)
0 -1 0 0 -12 -64 0 0 crypto_r B0L ?? 0:00.00 (crypto returns)
0 -1 0 0 -12 -64 0 0 crypto_w B0L ?? 0:00.00 (crypto 0)
0 -1 0 0 -12 -64 0 0 tqthr B0L ?? 0:00.00 (firmware taskq)
0 -1 0 0 -12 -64 0 0 tqthr B0L ?? 0:00.00 (taskq_cpu 0)
0 -1 0 0 -12 -64 0 0 waitport B0L ?? 0:00.00 (ifnet 0)
0 -1 0 0 -12 -64 0 0 waitport B0L ?? 0:01.00 (netisr_cpu 0)
0 -1 0 0 -12 -64 0 0 waitport B0L ?? 0:00.00 (disk_msg_core)
0 -1 0 0 -12 -64 0 0 waitport B0L ?? 0:00.39 (devfs_msg_core)
0 -1 0 0 -12 -64 0 0 - B0L ?? 0:00.00 (dsched 0)
0 -1 0 0 -2 -54 0 0 schslp B0L ?? 0:00.00 (usched 0)
0 -1 0 0 -2 -54 0 0 sched_th B0L ?? 0:00.00 (usched 0)
0 -1 0 0 -14 -66 0 0 - B0L ?? 0:00.00 (ithread196 0)
0 -1 0 0 -12 -64 0 0 - B0L ?? 0:00.00 (ithreadE 0)
0 -1 0 0 -12 -64 0 0 - B0L ?? 0:01.94 (softclock 0)
0 -1 0 0 -12 -64 0 0 syncer B0L ?? 0:00.00 (syncer6)
0 -1 0 0 -12 -64 0 0 syncer B0L ?? 0:00.02 (syncer5)
0 0 -1 23 166 0 0 0 nowork D0Ls ?? 0:01.45 (swapper)
0 1 0 1 152 0 1900 40 wait I0Ls ?? 0:00.00 /sbin/init -
0 424 1 0 152 0 6928 900 kqread I0Ls ?? 0:00.33 /usr/sbin/syslogd -ss
0 622 1 0 152 0 7276 936 nanslp S0Ls ?? 0:00.11 /usr/sbin/dntpd -s pool.ntp.org
0 750 1 2 153 0 30320 928 kqread I0Ls ?? 0:00.00 /usr/sbin/sshd
0 776 1 0 152 0 9668 928 nanslp I0Ls ?? 0:00.00 /usr/sbin/cron
0 878 750 8 157 0 33072 1192 kqread I0Ls ?? 0:00.05 sshd: gpr [priv] (sshd)
1001 880 878 0 152 0 35372 1268 kqread S0L ?? 0:00.22 sshd: gpr@pts/0 (sshd)
1001 884 883 0 152 0 14368 1216 kqread S0Ls ?? 0:00.18 screen csh
0 917 1 0 152 0 4388 656 kqread I0Ls ?? 0:00.01 /usr/sbin/rtsold vtnet0
0 921 750 4 154 0 33072 1192 kqread I0Ls ?? 0:00.02 sshd: gpr [priv] (sshd)
1001 923 921 0 152 0 35372 1132 kqread I0L ?? 0:00.11 sshd: gpr@pts/2 (sshd)
1001 927 926 0 152 0 14368 1052 kqread I0Ls ?? 0:00.04 screen csh
0 1144 750 7 156 0 33072 1200 kqread I0Ls ?? 0:00.05 sshd: gpr [priv] (sshd)
1001 1158 1144 1 152 0 35180 1356 kqread S0L ?? 0:00.52 sshd: gpr@pts/4 (sshd)
0 1194 750 2 153 0 33072 1200 kqread I0Ls ?? 0:00.02 sshd: gpr [priv] (sshd)
1001 1208 1194 0 152 0 35372 1300 kqread S0L ?? 0:00.45 sshd: gpr@pts/5 (sshd)
1001 1228 1227 0 152 0 14368 1224 kqread S0Ls ?? 0:00.58 screen csh
0 875 1 13 160 0 6912 920 ttyin I0Ls+ v1 0:00.01 /usr/libexec/getty Pc ttyv1
0 876 1 6 156 0 6912 920 ttyin I0Ls+ v2 0:00.00 /usr/libexec/getty Pc ttyv2
0 877 1 6 156 0 6912 920 ttyin I0Ls+ v3 0:00.00 /usr/libexec/getty Pc ttyv3
1001 881 880 3 154 0 18532 1012 pause I0Ls 0 0:00.08 -tcsh (tcsh)
1001 883 881 0 152 0 12596 1216 pause S0L+ 0 0:00.04 screen csh
1001 885 884 0 152 0 16240 992 pause I0Ls 1 0:00.08 csh
1001 930 885 17 162 0 10132 972 wait I0L+ 1 0:00.06 make
1001 1001 930 9 157 0 10260 968 wait I0L+ 1 0:00.05 make CONFIG_DONE_BOOST-LIBS=1 /us
1001 1295 1001 25 167 0 47464 3304 kqread S0L+ 1 0:04.26 bjam --layout=system --prefix=/us
1001 7115 1295 27 168 0 6344 1716 wait S0L 1 0:00.02 (c++)
1001 7116 7115 128 230 0 243052 234560 - R0L 1 0:06.91 (cc1plus)
1001 7117 7115 33 172 0 6548 5612 piperd S0L 1 0:00.02 /usr/libexec/binutils224/elf/as -
1001 924 923 4 154 0 18532 1012 pause I0Ls 2 0:00.04 -tcsh (tcsh)
1001 926 924 0 152 0 12596 1220 pause S0L+ 2 0:00.04 screen csh
1001 928 927 0 152 0 16240 996 pause I0Ls 3 0:00.02 csh
1001 1135 928 2 177 20 10068 1056 wait S0NL+ 3 0:02.06 sh ps.sh
1001 7160 1135 2 177 20 9364 1604 - R0NL+ 3 0:00.01 ps axl
1001 1163 1158 13 160 0 18532 1012 pause I0Ls 4 0:00.02 -tcsh (tcsh)
1001 1177 1163 0 152 0 14972 1556 kqread S0L+ 4 0:01.95 top
1001 1213 1208 4 154 0 18532 1012 pause I0Ls 5 0:00.04 -tcsh (tcsh)
1001 1227 1213 0 152 0 12596 1240 pause S0L+ 5 0:00.06 screen csh
1001 1229 1228 2 153 0 16236 992 pause I0Ls 6 0:00.02 csh
1001 1251 1229 0 152 0 11004 1052 nanslp S0L+ 6 0:00.87 vmstat -w1
Updated by gpr almost 10 years ago
- File ps-axl.txt ps-axl.txt added
"ps axl" just before hang attached
Updated by gpr almost 10 years ago
"vmstat -w1" just before hang:
procs memory page disks faults cpu
1 0 0 0 60744 12010 0 0 0 644 0 0 0 393 768 408 83 17 0
1 0 0 0 25428 10042 0 0 0 594 0 0 0 305 781 281 80 20 0
1 0 0 0 22300 11000 3 5 18 7144 423272 46 0 350 789 378 58 42 0
2 0 0 0 13064 8393 0 0 378 9045 203454 445 0 532 155 677 47 45 8
1 0 0 0 19040 2177 47 4 572 4128 40932 562 0 679 696 1151 67 31 1
Updated by gpr almost 10 years ago
- Subject changed from objcache exhausted to objcache exhausted - hard lock
Updated by gpr over 9 years ago
I have to use attached patch as workaround for buildworld, it allows me to finish buildworld successfully. Of course, it does not solve issue, just eases upgrading for me.
Updated by gpr over 8 years ago
After series of commits b5a81216fcb5c05639f98aeed1ae37853fa0d2da..ba08e4d11d410a85b602eeadceabcc14e40bfb1f to virtio/block all works fine. It seems the bug was in virtio/block driver which had been fixed.
Updated by gpr over 8 years ago
- Category changed from VM subsystem to Other
- Status changed from New to Resolved
bug in virtio/block, fixed in b5a81216fcb5c05639f98aeed1ae37853fa0d2da..ba08e4d11d410a85b602eeadceabcc14e40bfb1f