Project

General

Profile

Actions

Bug #2547

open

crashed while doing a dry run of pkg_rolling-replace

Added by phma over 8 years ago. Updated over 8 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Target version:
Start date:
04/13/2013
Due date:
% Done:

0%

Estimated time:

Description

I rebooted into a kernel compiled from sources I updated just a few days ago. I ran pkg_rolling-replace -nuv and it ran for a few hours, then crashed. The kernel dump is number 14 in my crash directory. The version is v3.5.0.25.g97861-DEVELOPMENT.

(kgdb) #0 _get_mycpu () at ./machine/thread.h:79
#1 md_dumpsys (di=0xc03e8b35)
at /usr/src/sys/platform/pc32/i386/dump_machdep.c:266
#2 0xc09646a0 in db_command_table ()
#3 0xc03e8b35 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:913
#4 0xc01968a3 in db_fncall (dummy1=-831973476, dummy2=0, dummy3=-1065827167,
dummy4=0xce691798 "(\361~\300\336L\212\300")
at /usr/src/sys/ddb/db_command.c:539
#5 0xc0196ca2 in db_command (aux_cmd_tablep_end=<optimized out>,
aux_cmd_tablep=<optimized out>, cmd_table=<optimized out>,
last_cmdp=0xc09f9318) at /usr/src/sys/ddb/db_command.c:401
#6 db_command_loop () at /usr/src/sys/ddb/db_command.c:467
#7 0xc019984b in db_trap (type=type@entry=12, code=code@entry=0)
at /usr/src/sys/ddb/db_trap.c:71
#8 0xc07903d0 in kdb_trap (type=type@entry=12, code=code@entry=0,
regs=regs@entry=0xce69190c)
at /usr/src/sys/platform/pc32/i386/db_interface.c:149
#9 0xc07c2602 in trap_fatal (frame=frame@entry=0xce69190c, eva=eva@entry=136)
at /usr/src/sys/platform/pc32/i386/trap.c:1107
#10 0xc07c27bd in trap_pfault (frame=frame@entry=0xce69190c, usermode=0,
usermode@entry=136, eva=eva@entry=136)
at /usr/src/sys/platform/pc32/i386/trap.c:1018
#11 0xc07c2dc3 in trap (frame=0xce69190c)
at /usr/src/sys/platform/pc32/i386/trap.c:695
#12 0xc0791c37 in calltrap ()
at /usr/src/sys/platform/pc32/i386/exception.s:787
#13 0xc0686b17 in fq_balance_self (tdio=0xc09edf00)
at /usr/src/sys/kern/dsched/fq/fq_core.c:351
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I also saw it freeze the first time I started a kde session after rebooting.

Actions #1

Updated by phma over 8 years ago

The kernel crashed again at 1:08. I was doing nothing; that's during the nightly periodic job. Maybe there's a bug in Hammer. I rebooted into the previous kernel.

Actions #2

Updated by phma over 8 years ago

It is now 1:29 and hammer (new world) is still running on the old kernel. The bug is therefore in the hammer code of the kernel.

Actions #3

Updated by alexh over 8 years ago

The bug is most definitely not in HAMMER but rather in dsched or dsched
fq.

Cheers,
Alex

On 2013-04-16 06:30, Pierre Abbat via Redmine wrote:

Issue #2547 has been updated by phma.

It is now 1:29 and hammer (new world) is still running on the old
kernel. The bug is therefore in the hammer code of the kernel.
----------------------------------------
Bug #2547: crashed while doing a dry run of pkg_rolling-replace
http://bugs.dragonflybsd.org/issues/2547

Author: phma
Status: New
Priority: High
Assignee:
Category:
Target version:

I rebooted into a kernel compiled from sources I updated just a few
days ago. I ran pkg_rolling-replace -nuv and it ran for a few hours,
then crashed. The kernel dump is number 14 in my crash directory. The
version is v3.5.0.25.g97861-DEVELOPMENT.

(kgdb) #0 _get_mycpu () at ./machine/thread.h:79
#1 md_dumpsys (di=0xc03e8b35)
at /usr/src/sys/platform/pc32/i386/dump_machdep.c:266
#2 0xc09646a0 in db_command_table ()
#3 0xc03e8b35 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:913
#4 0xc01968a3 in db_fncall (dummy1=-831973476, dummy2=0,
dummy3=-1065827167,
dummy4=0xce691798 "(\361~\300\336L\212\300")
at /usr/src/sys/ddb/db_command.c:539
#5 0xc0196ca2 in db_command (aux_cmd_tablep_end=<optimized out>,
aux_cmd_tablep=<optimized out>, cmd_table=<optimized out>,
last_cmdp=0xc09f9318) at /usr/src/sys/ddb/db_command.c:401
#6 db_command_loop () at /usr/src/sys/ddb/db_command.c:467
#7 0xc019984b in db_trap (type=type@entry=12, code=code@entry=0)
at /usr/src/sys/ddb/db_trap.c:71
#8 0xc07903d0 in kdb_trap (type=type@entry=12, code=code@entry=0,
regs=regs@entry=0xce69190c)
at /usr/src/sys/platform/pc32/i386/db_interface.c:149
#9 0xc07c2602 in trap_fatal (frame=frame@entry=0xce69190c,
eva=eva@entry=136)
at /usr/src/sys/platform/pc32/i386/trap.c:1107
#10 0xc07c27bd in trap_pfault (frame=frame@entry=0xce69190c,
usermode=0,
usermode@entry=136, eva=eva@entry=136)
at /usr/src/sys/platform/pc32/i386/trap.c:1018
#11 0xc07c2dc3 in trap (frame=0xce69190c)
at /usr/src/sys/platform/pc32/i386/trap.c:695
#12 0xc0791c37 in calltrap ()
at /usr/src/sys/platform/pc32/i386/exception.s:787
#13 0xc0686b17 in fq_balance_self (tdio=0xc09edf00)
at /usr/src/sys/kern/dsched/fq/fq_core.c:351
Backtrace stopped: previous frame inner to this frame (corrupt
stack?)

I also saw it freeze the first time I started a kde session after
rebooting.

Actions #4

Updated by phma over 8 years ago

Both disks are set to fq. Should I set them to something else and see what happens?

Actions #5

Updated by vsrinivas over 8 years ago

Yea, you'll probably want to use the noop scheduler; it is the best-tested of all the dsched modules.

Actions #6

Updated by phma over 8 years ago

Confirmed: the bug is in fq. I rebooted it with both disks set to noop, and it's still up.

Actions

Also available in: Atom PDF