Bug #1849
closedpanic: dsched: malloc limit exceeded
0%
Description
Hi,
I can reproduce the panic above quite easily - just doing a pkgsrc
build of my selected packages does the job - somewhere in the perl build.
System is built from a recent master
v2.7.3.1084.g7b033-DEVELOPMENT, running on a Phenom II 955 in an M4A78L-M
motherboard with 4GB of memory, a 32GB SSD providing an 8GB root and 2 8GB
swap partitions (with swapcache enabled) and a pair of terabyte drives in
vinum based RAID1 holding a 50GB UFS partition and the rest Hammer. The
package build was taking place in a chroot on the 50GB UFS partition.
Trace beginning at frame 0xf0eceb00
panic(ffffffff) at panic+0x174
panic(c05f1717,c05f270a,b715ed0,0,f14d0100) at panic+0x174
kmalloc(100,c06a9480,2,f0eceb90,c030fc68) at kmalloc+0x92
objcache_malloc_alloc(c06a9540,2,c6a91640,e1dd9d18,dccc9918) at
objcache_malloc_ alloc+0x15
objcache_get(c6a91618,2,dccc9918,f0fcd180,ff800000) at objcache_get+0x1af
dsched_thread_io_alloc(dccc9918,f0fcd180,c06a9560) at dsched_thread_io_alloc
+0x1 9
dsched_thread_ctx_alloc(0,e219c758,ff800000,f0ecebf4,c0322faa) at
dsched_thread_ ctx_alloc+0x5b
dsched_new_thread(e219c758) at dsched_new_thread+0x30
lwkt_init_thread(e219c758,eff64000,3000,600,ff800000) at lwkt_init_thread
+0xfd lwkt_alloc_thread(0,3000,ffffffff,0,c0000034) at lwkt_alloc_thread
+0xb7 lwp_fork(e20ec360,dcc7d3e0,c0000034) at lwp_fork+0xed
fork1(f12997f8,c0000034,f0ececb4,c032049f,e1dd9d18) at fork1+0x5bb
sys_vfork(f0ececf0,0,0,c06a08f8,246) at sys_vfork+0x4a
syscall2(f0eced40) at syscall2+0x2b0
Xint0x80_syscall() at Xint0x80_syscall+0x36
boot() called on cpu#0
Files
Updated by steve about 14 years ago
On Wed, 22 Sep 2010 17:30:32 +0100
"Steve O'Hara-Smith" <steve@sohara.org> wrote:
Hi,
I can reproduce the panic above quite easily - just doing a pkgsrc
build of my selected packages does the job - somewhere in the perl build.
It seems this is caused by doing the pkgsrc build on a UFS
filesystem, moving it to a hammer filesystem has stopped dsched memory use
growing out of bounds.
Updated by vsrinivas about 14 years ago
This is pretty interesting! Can you grab the output of vmstat -m?
thanks,
-- vs
Updated by steve about 14 years ago
On Thu, 23 Sep 2010 12:51:28 +0000
"Venkatesh Srinivas \(via DragonFly issue tracker\)"
<bugs@crater.dragonflybsd.org> wrote:
Venkatesh Srinivas <me@acm.jhu.edu> added the comment:
This is pretty interesting! Can you grab the output of vmstat -m?
Attached the vmstat -m from the crash report - dsched really is
maxed out.
Experimenting this morning I found that installing DragonFly into
the chroot for the package build caused the dsched using to hit around 120K
items using about 24MB of memory in the time it took to do an installworld.
Running the same build on a hammer filesystem is causing it to
creep up slowly - but i is creeping up.
Updated by alexh about 14 years ago
The output of test/dsched would help here. It'll show the number of
currently allocated structures of each type.
Cheers,
Alex
Updated by steve about 14 years ago
On Thu, 23 Sep 2010 14:36:34 +0100
Alex Hornung <ahornung@gmail.com> wrote:
The output of test/dsched would help here. It'll show the number of
currently allocated structures of each type.
This is from my system at current, which had the initial spike from
preparing a chroot on UFS and has been running a pkgsrc build on hammer for
some time now.
./dschedstats
Allocations
thread_ctx: 16998
thread_io: 118986
disk_ctx: 7
---------------------------------------------
Procs/Threads tracked
procs: 133
threads: 217
---------------------------------------------
Transactions
w/o thread_ctx: 2
Updated by alexh about 14 years ago
On 23/09/10 15:00, Steve O'Hara-Smith wrote:
On Thu, 23 Sep 2010 14:36:34 +0100
Alex Hornung <ahornung@gmail.com> wrote:The output of test/dsched would help here. It'll show the number of
currently allocated structures of each type.This is from my system at current, which had the initial spike from
preparing a chroot on UFS and has been running a pkgsrc build on hammer for
some time now../dschedstats
Allocations
thread_ctx: 16998
thread_io: 118986
disk_ctx: 7
---------------------------------------------
Procs/Threads tracked
procs: 133
threads: 217
---------------------------------------------
Transactions
w/o thread_ctx: 2
The number of procs+threads should match the number of thread_ctx. That
clearly isn't the case, which indicates that there are some stale
references on the thread_ctx. I don't have much time to look into it
this week, but I'll take a look at it next week.
Cheers,
Alex
Updated by dillon about 14 years ago
:> thread_ctx: 16998
:> thread_io: 118986
:> disk_ctx: 7
:> ---------------------------------------------
:The number of procs+threads should match the number of thread_ctx. That
:clearly isn't the case, which indicates that there are some stale
:references on the thread_ctx. I don't have much time to look into it
:this week, but I'll take a look at it next week.
:
:Cheers,
:Alex
I'll mess with it a bit. It's pretty clear that the extra thread_ctx's
are getting stuck due to thread_io references which are not being torn
down. Now the question is how and where.
It looks like there are a few places where subsystems still generate
their own pbufs. CCD, the original ata driver (not nata), and
in fact devfs's device-special I/O is too.
Are you using CCD or VN by any chance? If not then there must be
some interaction with devfs.
I will commit some fixes in a bit so you can test as well, since
you are able to reproduce it easily.
-Matt
Matthew Dillon
<dillon@backplane.com>
Updated by dillon about 14 years ago
Looks like the NATA raid code also, and vinum.
-Matt
Updated by steve about 14 years ago
On Thu, 23 Sep 2010 16:31:08 -0700 (PDT)
Matthew Dillon <dillon@apollo.backplane.com> wrote:
I'll mess with it a bit. It's pretty clear that the extra
thread_ctx's are getting stuck due to thread_io references which are not
being torn down. Now the question is how and where.It looks like there are a few places where subsystems still generate
their own pbufs. CCD, the original ata driver (not nata), and
in fact devfs's device-special I/O is too.Are you using CCD or VN by any chance? If not then there must be
some interaction with devfs.
No CCD or VN, I am using a vinum RAID1.
Updated by steve about 14 years ago
On Thu, 23 Sep 2010 16:44:13 -0700 (PDT)
Matthew Dillon <dillon@apollo.backplane.com> wrote:
Looks like the NATA raid code also, and vinum.
It looks like your fixes have nailed this one down. I just tried
the worst case I know (pushing lots of files to UFS on vinum RAID1) the
dsched numbers went up and then back down to what they were before I
started.
Updated by vsrinivas about 14 years ago
Seems to be resolved; buffers were being allocated/deallocated incorrectly in a
number of subsystems.