Bug #1849: panic: dsched: malloc limit exceeded - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #1849

closed

panic: dsched: malloc limit exceeded

Added by steve about 15 years ago. Updated about 15 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Target version:

Start date:

Due date:

% Done:

Estimated time:

Description

Hi,

I can reproduce the panic above quite easily - just doing a pkgsrc
build of my selected packages does the job - somewhere in the perl build.

System is built from a recent master
v2.7.3.1084.g7b033-DEVELOPMENT, running on a Phenom II 955 in an M4A78L-M
motherboard with 4GB of memory, a 32GB SSD providing an 8GB root and 2 8GB
swap partitions (with swapcache enabled) and a pair of terabyte drives in
vinum based RAID1 holding a 50GB UFS partition and the rest Hammer. The
package build was taking place in a chroot on the 50GB UFS partition.

Trace beginning at frame 0xf0eceb00
panic(ffffffff) at panic+0x174
panic(c05f1717,c05f270a,b715ed0,0,f14d0100) at panic+0x174
kmalloc(100,c06a9480,2,f0eceb90,c030fc68) at kmalloc+0x92
objcache_malloc_alloc(c06a9540,2,c6a91640,e1dd9d18,dccc9918) at
objcache_malloc_ alloc+0x15
objcache_get(c6a91618,2,dccc9918,f0fcd180,ff800000) at objcache_get+0x1af
dsched_thread_io_alloc(dccc9918,f0fcd180,c06a9560) at dsched_thread_io_alloc
+0x1 9
dsched_thread_ctx_alloc(0,e219c758,ff800000,f0ecebf4,c0322faa) at
dsched_thread_ ctx_alloc+0x5b
dsched_new_thread(e219c758) at dsched_new_thread+0x30
lwkt_init_thread(e219c758,eff64000,3000,600,ff800000) at lwkt_init_thread
+0xfd lwkt_alloc_thread(0,3000,ffffffff,0,c0000034) at lwkt_alloc_thread
+0xb7 lwp_fork(e20ec360,dcc7d3e0,c0000034) at lwp_fork+0xed
fork1(f12997f8,c0000034,f0ececb4,c032049f,e1dd9d18) at fork1+0x5bb
sys_vfork(f0ececf0,0,0,c06a08f8,246) at sys_vfork+0x4a
syscall2(f0eced40) at syscall2+0x2b0
Xint0x80_syscall() at Xint0x80_syscall+0x36
boot() called on cpu#0

Files

xx (11.4 KB) xx

steve, 09/23/2010 01:24 PM

Actions

Copy link

Updated by steve about 15 years ago

On Wed, 22 Sep 2010 17:30:32 +0100
"Steve O'Hara-Smith" <steve@sohara.org> wrote:

Hi,

I can reproduce the panic above quite easily - just doing a pkgsrc
build of my selected packages does the job - somewhere in the perl build.

It seems this is caused by doing the pkgsrc build on a UFS
filesystem, moving it to a hammer filesystem has stopped dsched memory use
growing out of bounds.

Actions

Copy link

Updated by vsrinivas about 15 years ago

This is pretty interesting! Can you grab the output of vmstat -m?

thanks,
-- vs

Actions

Copy link

Updated by steve about 15 years ago

On Thu, 23 Sep 2010 12:51:28 +0000
"Venkatesh Srinivas \(via DragonFly issue tracker\)"
<bugs@crater.dragonflybsd.org> wrote:

Venkatesh Srinivas <me@acm.jhu.edu> added the comment:

This is pretty interesting! Can you grab the output of vmstat -m?

Attached the vmstat -m from the crash report - dsched really is
maxed out.

Experimenting this morning I found that installing DragonFly into
the chroot for the package build caused the dsched using to hit around 120K
items using about 24MB of memory in the time it took to do an installworld.

Running the same build on a hammer filesystem is causing it to
creep up slowly - but i is creeping up.

Actions

Copy link

Updated by alexh about 15 years ago

The output of test/dsched would help here. It'll show the number of
currently allocated structures of each type.

Cheers,
Alex

Actions

Copy link

Updated by steve about 15 years ago

On Thu, 23 Sep 2010 14:36:34 +0100
Alex Hornung <ahornung@gmail.com> wrote:

The output of test/dsched would help here. It'll show the number of
currently allocated structures of each type.

This is from my system at current, which had the initial spike from
preparing a chroot on UFS and has been running a pkgsrc build on hammer for
some time now.

./dschedstats 
Allocations
thread_ctx:    16998
thread_io:    118986
disk_ctx:    7
---------------------------------------------
Procs/Threads tracked
procs:        133
threads:    217
---------------------------------------------
Transactions
w/o thread_ctx:    2

Actions

Copy link

Updated by alexh about 15 years ago

On 23/09/10 15:00, Steve O'Hara-Smith wrote:

On Thu, 23 Sep 2010 14:36:34 +0100
Alex Hornung <ahornung@gmail.com> wrote:

The output of test/dsched would help here. It'll show the number of
currently allocated structures of each type.

This is from my system at current, which had the initial spike from
preparing a chroot on UFS and has been running a pkgsrc build on hammer for
some time now.

./dschedstats
Allocations
thread_ctx: 16998
thread_io: 118986
disk_ctx: 7
---------------------------------------------
Procs/Threads tracked
procs: 133
threads: 217
---------------------------------------------
Transactions
w/o thread_ctx: 2

The number of procs+threads should match the number of thread_ctx. That
clearly isn't the case, which indicates that there are some stale
references on the thread_ctx. I don't have much time to look into it
this week, but I'll take a look at it next week.

Cheers,
Alex

Actions

Copy link

Updated by dillon about 15 years ago

:> thread_ctx: 16998
:> thread_io: 118986
:> disk_ctx: 7
:> ---------------------------------------------
:The number of procs+threads should match the number of thread_ctx. That
:clearly isn't the case, which indicates that there are some stale
:references on the thread_ctx. I don't have much time to look into it
:this week, but I'll take a look at it next week.
:
:Cheers,
:Alex

I'll mess with it a bit.  It's pretty clear that the extra thread_ctx's
    are getting stuck due to thread_io references which are not being torn
    down.  Now the question is how and where.

It looks like there are a few places where subsystems still generate
    their own pbufs.  CCD, the original ata driver (not nata), and
    in fact devfs's device-special I/O is too.

Are you using CCD or VN by any chance?  If not then there must be
    some interaction with devfs.

I will commit some fixes in a bit so you can test as well, since
    you are able to reproduce it easily.

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by dillon about 15 years ago

Looks like the NATA raid code also, and vinum.

-Matt

Actions

Copy link

Updated by steve about 15 years ago

On Thu, 23 Sep 2010 16:31:08 -0700 (PDT)
Matthew Dillon <dillon@apollo.backplane.com> wrote:

I'll mess with it a bit. It's pretty clear that the extra
thread_ctx's are getting stuck due to thread_io references which are not
being torn down. Now the question is how and where.

It looks like there are a few places where subsystems still generate
their own pbufs. CCD, the original ata driver (not nata), and
in fact devfs's device-special I/O is too.

Are you using CCD or VN by any chance? If not then there must be
some interaction with devfs.

No CCD or VN, I am using a vinum RAID1.

Actions

Copy link

#10

Updated by steve about 15 years ago

On Thu, 23 Sep 2010 16:44:13 -0700 (PDT)
Matthew Dillon <dillon@apollo.backplane.com> wrote:

Looks like the NATA raid code also, and vinum.

It looks like your fixes have nailed this one down. I just tried
the worst case I know (pushing lots of files to UFS on vinum RAID1) the
dsched numbers went up and then back down to what they were before I
started.

Actions

Copy link

#11

Updated by vsrinivas about 15 years ago

Seems to be resolved; buffers were being allocated/deallocated incorrectly in a
number of subsystems.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

DragonFlyBSD

Bug #1849

panic: dsched: malloc limit exceeded

Updated by steve about 15 years ago

Updated by vsrinivas about 15 years ago

Updated by steve about 15 years ago

Updated by alexh about 15 years ago

Updated by steve about 15 years ago

Updated by alexh about 15 years ago

Updated by dillon about 15 years ago

Updated by dillon about 15 years ago

Updated by steve about 15 years ago

Updated by steve about 15 years ago

Updated by vsrinivas about 15 years ago